Issues with ssh-agent connecting to a large number of hosts at once

Frank Sorenson frank at tuxrocks.com
Tue Apr 21 19:57:43 MDT 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bob Belnap wrote:
> Hi,
> 
> I'm having problems with ssh-agent when I am connecting to a large (several
> hundred) hosts at once.  I'm using a kanif (
> http://taktuk.gforge.inria.fr/kanif/) which is a very nice package that
> distributes ssh connections across the hosts you are connecting to (a
> fan-out sort of approach, so all connections are not coming from one host).
> However, all hosts have to authenticate, so all the hosts have to wind their
> way back to the ssh-agent.  This problem isn't isolated to just kanif,
> however.   I see it when using other utilities that rely on many concurrent
> connections to the ssh-agent.
> 
> running strace on the ssh-agent, things start out ok, then go sour and it
> starts spitting out:
> 
> read(160, 0xbf8f300a, 1024)             = -1 EAGAIN (Resource temporarily
> unavailable)
> read(160, 0xbf8f300a, 1024)             = -1 EAGAIN (Resource temporarily
> unavailable)
> read(160, 0xbf8f300a, 1024)             = -1 EAGAIN (Resource temporarily
> unavailable)

The manpage for read(2) shows:
       EAGAIN Non-blocking I/O has been selected using O_NONBLOCK and no
data was immediately available for reading.

Can you show us the output of:  readlink /proc/`pidof ssh-agent`/fd/160
 (change 160 to whatever fd is giving the EAGAIN)
Or even just:  ls /proc/`pidof ssh-agent`/fd

With so many ssh connections, I'd be curious to see what your entropy
pool looks like.  Do you have any remaining
in/proc/sys/kernel/random/entropy_avail or has the pool been exhausted?

<snip>
> at that point I kill the agent, but it will stick at that value if I don't.
> It's not always 287, but varies.  I've seen it as high as 447 connections at
> once, but it's usually in the 200 range.
> 
> I've tried different ssh-agents on different kernels and machines, and
> haven't found a combination that works.  However, I have tried it on a
> FreeBSD box which did not have the problem.
> 
> It seems to me that I'm hitting some kind of kernel limit (open file limit
> perhaps?)  But I've fiddled with various sysctl values with no good
> results.  Has anyone ran across this, or have any further debugging
> suggestions?


Frank
- --
Frank Sorenson - KD7TZK
Linux Systems Engineer, DSS Engineering, UBS AG
frank at tuxrocks.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iEYEARECAAYFAknueZcACgkQaI0dwg4A47z0xwCfWMYO8/Lbx51TdLLXTSGcAGlJ
BskAn3g8EbsFCRz4GUZ79gz/nbJ45c2F
=vJWt
-----END PGP SIGNATURE-----



More information about the PLUG mailing list