[OpenAFS] some older openafs-client versions have started failing

Jonathan A. Kollasch jakllsch@kollasch.net
Thu, 14 Jul 2016 16:26:45 -0500


On Thu, Jul 14, 2016 at 02:55:49PM -0500, Chad William Seys wrote:
> Hi all,
> 	We have begun suddenly begun experiencing client failures and are trying 
> to determine what is going on.
> 
> openafs-client versions 1.6.9, 1.6.14, 1.6.15 fail in various ways*.  On 
> Debian we can reproduce the problem by 'git checkout' a particular repo. It 
> fails with a "Connection timed out".  On Scientific Linux the problem 
> manifests sooner: 'ls /afs/ANYCELL' hangs.  
> 
> openafs-client 1.6.16, 1.6.17, 1.6.18.1 seem to work normally.
> 
> I've tried changing the server's fileserver version but that has no effect.  
> (Tried Debian packages with versions 1.6.1-3+deb7u6, 1.6.9+deb8u5, and 
> 1.6.18.1-1 .)
> 
> We started noticing this problem after a power failure.  We think what 
> happened was that new fileserver code started being used after the servers 
> rebooted.  Probably fileserver code changed from Debian 1.6.1-3+deb7u5 to 
> 1.6.1-3+deb7u6 .  Strangely though reverting back to what we think were the 
> working versions also does not work.
> 
> Anyone have an idea of what might be going on ?
> 
> Thanks!
> Chad. 

I currently see similar issues with Debian Wheezy and Debian Jessie.
git gc consistently fails with ETIMEDOUT for the same path on both
machines.  My fileservers have not changed recently.

When I mentioned this #openafs on Freenode, Benjamin Kaduk seemed to
think this problem exists in the client/cache manager.

	Jonathan Kollasch