[OpenAFS] Re: OpenAFS freeze problems

John Tang Boyland boyland@uwm.edu
Mon, 27 Feb 2012 20:01:28 -0600

] About every few hours or so, AFS "freezes" on a write:
] the attempt to write blocks for about 30 seconds or so.


As suspected, there is no problem with the number of threads; the rxdebug
command shows 0 threads used out of 11 while a freeze is happening.

Some people suggested I blacklist clients that (apparently)
don't respond to callback breaking.  But that won't work because
(1) it could be that the campus wireless is blocking access
    (not sure here)
(2) when you close a laptop it won't respond to anything.
    (Most of the students using AFS on our cell have OpenAFS on
     their laptops.)
(3) If you move your laptop to a new location on campus, you get a new
    IP address, and no one will respond at the old IP address.
None of these are the fault of the client.

So the only solution would be to decouple callback breaking from
giving permission to write.  Right now, the attempt to write
stalls while the server attempts to tell clients the callbacks are
broken.  I don't understand why the client doing the write
has to wait for the other clients to ack the callback breaks.
Why not permit the write to go ahead while the server continues
to try to notify the other clients of the write?  

In other words, is there any information that these clients
(whose callbcaks are being broken) could say that would cause the
server to deny the write attempt?  If not, then why delay it?

Best regards,
John Boyland