[OpenAFS] 1.6.1pre1: partial success on EL6

Stephan Wiesand stephan.wiesand@desy.de
Mon, 19 Dec 2011 17:24:41 +0100

On Dec 19, 2011, at 16:28 , omalleys@msu.edu wrote:

> Are you testing this in "lab" conditions? Im curious as to how you are =
replicating the issue.

I think it's described fairly accurately in =
https://rt.central.org/rt/Ticket/Display.html?id=3D130327 . In short: =
have a few dozen clients writing large files to the same fileserver, =
then wait for O(30m). See how 1.4 clients - and 1.6 clients with =
idledead disabled - succeed, and unmodified 1.6 clients fail and hang =

> Also if you can, can you try this by running it on a single core or =
disabling threads and getting the same results?

If it helps shed more light on the issue and find a solution more =
satisfactory than disabling idledead (which I'd be absolutely happy =
with), I will.

But according to this thread =
l , it seems already well understood what's going on.

> Quoting Stephan Wiesand <stephan.wiesand@desy.de>:
>> OS: EL6.1
>> arch: amd64
>> kernels: 2.6.32-131.21.1.el6, 2.6.32-220.1.1.el6 (module built =
against 2.6.32-71.el6)
>> It builds, and it basically works.
>> It seems to partially address the nat ping issue, but servers still =
get pinged more often than intended.
>> It fails to fix RT #130327. If a fileserver is very busy, clients =
fail writing to it and then hang, making AFS unusable on the client =
machine until it's rebooted.

Stephan Wiesand
Platanenallee 6
15738 Zeuthen, Germany