[OpenAFS] Re: OpenAFS freeze problems

Jeffrey Altman jaltman@your-file-system.com
Tue, 28 Feb 2012 08:55:04 -0500


This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig90FC3E5A10C161D241A39FB3
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

On 2/27/2012 9:01 PM, John Tang Boyland wrote:
> ] About every few hours or so, AFS "freezes" on a write:
> ] the attempt to write blocks for about 30 seconds or so.
>=20
> ...
>=20
> As suspected, there is no problem with the number of threads; the rxdeb=
ug
> command shows 0 threads used out of 11 while a freeze is happening.
>=20
> Some people suggested I blacklist clients that (apparently)
> don't respond to callback breaking.  But that won't work because
> (1) it could be that the campus wireless is blocking access
>     (not sure here)

That is fairly easy to test.  Take a client running openafs, discover
its IP address and then attempt "rxdebug <addr> 7001 -version".
If you timeout, there is a firewall in play, if not, then not.

> (2) when you close a laptop it won't respond to anything.
>     (Most of the students using AFS on our cell have OpenAFS on
>      their laptops.)

The Windows 1.7 versions detect the suspend power notification and issue
a "GiveUpAllCallbacks" RPC to all servers in the two second window it
gets before power loss.

> (3) If you move your laptop to a new location on campus, you get a new
>     IP address, and no one will respond at the old IP address.

The file server tracks clients by a UUID and will update the client's
host record on first contact from the new address/port number.  There
can be a delay here while the old addr/port is contacted.

> None of these are the fault of the client.

> So the only solution would be to decouple callback breaking from
> giving permission to write. =20

Doing so would break the distributed cache coherency that AFS provides.

> Right now, the attempt to write
> stalls while the server attempts to tell clients the callbacks are
> broken.  I don't understand why the client doing the write
> has to wait for the other clients to ack the callback breaks.
> Why not permit the write to go ahead while the server continues
> to try to notify the other clients of the write? =20
>=20
> In other words, is there any information that these clients
> (whose callbcaks are being broken) could say that would cause the
> server to deny the write attempt?  If not, then why delay it?

Clients do not respond to callbacks to grant permission.  Clients
acknowledge receipt of a callback as an indication that it is acceptable
to complete the RPC to the requester.  This ensures that any messages
the requester might send to another node regarding data written to /afs
will arrive after the data is available to the receiving application.

If callback delivery problems are the cause of delays you should see log
messages in the FileLog if you bump the log level.

As we discussed last night on the phone, there are other possibilities
for the delays which are not callback related.

Jeffrey Altman


--------------enig90FC3E5A10C161D241A39FB3
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)

iQEcBAEBAgAGBQJPTNy7AAoJENxm1CNJffh4I4kH/3vALf3MbRGyEuSHAFZMv3Cy
rd0CBcgV+1jfjYBU/yEFOY2AZg2g87BmNFTHlIEV08kV8D9H/NtInaBcCGZzBCdU
Qm/iaJscTexwNuBOtn4JcNJWpyqFsc5DOtJ7HqN5uLSnOqN+7RRJTSY4luI1Uvsq
2cFxpi5st2itrCfjV8q7z1adgY5YccGplBvj4XpGbfTXBN3MjJgj/3wE9nSQ1U/3
Atxr44n1H6K5NQt6pRswRmj1HWFKEmRkFISBAKUgjkabRzJfS/EzTZs2Qmgg9zRB
UVIvcck959I9S1UgBfdS5piA7bceySSSKNII0Ryt8cGj5TaSO38I/bYfuN94ghc=
=UUat
-----END PGP SIGNATURE-----

--------------enig90FC3E5A10C161D241A39FB3--