[OpenAFS] Graphical file managers get stuck

Jeffrey Altman jaltman@your-file-system.com
Mon, 10 Dec 2012 19:12:51 -0500

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

There are a long list of items that can result in poor performance.
Here is a short list that is most likely incomplete:

1. VLDB addresses in CellServDB are out of date and there
   in fact are no servers at those addresses

2. VLDB addresses in CellServDB are correct but the servers
   have been firewalled from the public network

3. Some VLDB addresses in CellServDB are incorrect and unlucky
   randomization of addresses results in timeouts

4. VLDB addresses are served via DNS AFSDB records but the local
   DNS server doesn't support them.  Many hotel network proxies
   have this behavior.

5. VLDB addresses are served via DNS SRV records but the client
   is too old to support them.

6. root.cell volume root directory requires authentication and
   repeated attempts to read without tokens results in abort
   throttling of the client.

7. The cell has no root.cell volume.

8. The VLDB servers are accessible but the file servers containing
   the root.cell volume are blocked by a firewall.

9. Local cell is accessible but all foreign cells are blocked
   by a firewall.

10. Application looks for a configuration file in each directory.
   Think desktop.ini on Windows or IIS web.config, etc.  Or worse
   attempts to create one.  Lack of privileges triggers abort

11. Applications (usually file browsers) decide to walk multiple
    subdirectory leels in an attempt to pre-generate the UI
    components so the application can appear to be responsive
    to the end user.  Such algorithms are often one size fits
    all even when knowledge that the file system is remote is

Things that an AFS distribution could do:

1. Stop shipping CellServDB files.  Require that orgs that want
   to use CellServDB files distribute them and encourage use of
   DNS by everyone.

2. Refuse to ship CellServDB entries if the servers are behind
   a firewall or are otherwise not publicly accessible.

3. Make -dynroot-sparse the default behavior for -dynroot.
   [Existing Windows client behavior]

4. Filter potential DNS queries when the cell name is not legal
   for DNS.  This requires OS specific knowledge.

5. No longer permit evaluation of mount points to cell name aliases.
   For example, "athena" when the cell name is "athena.mit.edu".
   [Existing Windows client behavior]

6. Immediately probe all discovered servers to determine up/down
   state at the expense of additional server load and network
   [Existing Windows client behavior]

7. Maintain a negative access cache to permit avoidance of
   repeatedly sending RPCs that will simply trigger an abort.
   [Existing Windows client behavior]

8. Submit patches to file browsers to fix their broken behavior.

On 12/10/2012 4:10 PM, Troy Benjegerdes wrote:
> This seems to be a common cause of pain for people using AFS,
> and I think its a user-interface experience that drives people
> away.
> You install AFS, and then all of a sudden you go do something
> and your user-interface just hangs. You have not idea what=20
> triggered it, you just associate 'crappy non-responsive computer'
> with this AFS thing.
> Is there any reasonable way we can provide a global /afs namespace,
> while still retaining good performance (i.e. under 100ms response
> time when file managers to into /afs/*/)?
> We can talk about client misconfiguration, or bad DNS , or bad
> network, or whatever, but the buck's got to stop somewhere. How
> can we provide fast response and still indicate somehow (with=20
> an AFS manager app/system tray???) that some servers may be=20
> inaccessible, slow, or misconfigured, but still not block when
> file managers go look at things??
> There should be a checkbox for "Yes, make me wait for responses
> from servers in cell XXX, and give me an indication who you're
> waiting for", otherwise non-local cells should probably just=20
> return whatever data they have, or just ENOTCONN=20

Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

Version: GnuPG v1.4.9 (MingW32)