[OpenAFS] Re: 1.4.x, select() and recent RHEL kernels beware

Andrew Deason adeason@sinenomine.net
Thu, 8 Nov 2012 10:18:27 -0600

On Thu, 8 Nov 2012 15:41:57 +0000
Dan Van Der Ster <daniel.vanderster@cern.ch> wrote:

> Finally we realised this was due to fssync.c in 1.4's use of
> select()/FD_SET and the corrupting behaviour of those functions when
> using >1024 file descriptors per process. Until quite recently this
> hadn't been a problem, since RHEL kernels used ulimit -Hn 1024 by
> default. However, as of kernel 2.6.32-279 the limit was raised to 4096
> (to purge certain distro's of dangerous applications ;) ). This means
> that all 1.4.x servers running with 2.6.32-279 and later will get
> corrupted stacks in fssync.c and probably crash.

That would explain why we've been seeing this only all of a sudden, when
this issue has in theory existed since forever.

> Note that 1.6 and beyond is safe from this RHEL kernel change since
> Simon already patched fssync to use poll() 5 years ago ;) 

That's not true; the code was written to use poll() but was not enabled
until very recently. I don't think there is any current release that
does this the way Linux wants.

A more recent ticket for this issue is 131372.

Andrew Deason