[OpenAFS] Weird client behaviour with openafs 1.4.5

Berthold Cogel cogel@uni-koeln.de
Fri, 25 Apr 2008 16:38:52 +0200


Derrick Brashear schrieb:
> On Thu, Apr 17, 2008 at 5:30 AM, Berthold Cogel <cogel@uni-koeln.de> wrote:
>> Hello!
>>
>>  Since some days I see this when I look into some AFS directories:
>>
>>
>>  [bco@co tsm]$ ll
>>  insgesamt 0
>>  ?--------- ? ? ? ?             ? dsmcad.rc.linux
>>  ?--------- ? ? ? ?             ? dsm.logrotate
>>  ?--------- ? ? ? ?             ? dsm.opt.linux
>>  ?--------- ? ? ? ?             ? dsm.opt.panfs
>>  ?--------- ? ? ? ?             ? dsm.opt.vmware
>>  ?--------- ? ? ? ?             ? dsm.sys.c3grid
>>  ?--------- ? ? ? ?             ? dsm.sys.failure
>>  ?--------- ? ? ? ?             ? dsm.sys.linux
>>  ?--------- ? ? ? ?             ? dsm.sys.panfs
>>  ?--------- ? ? ? ?             ? dsm.sys.vmware
>>  ?--------- ? ? ? ?             ? inclexcl.linux
>>  ?--------- ? ? ? ?             ? inclexcl.panfs
>>  ?--------- ? ? ? ?             ? RCS
>>
>>  Not always the same volume but with the same server:
>>
>>  [bco@co tsm]$ fs checks
>>  These servers unavailable due to network or server problems:
>> afsfs1.rrz.uni-koeln.de.
>>
> 
> Is NAT in play?
> Are callbacks being lost? The 1.2.13 server almost certainly has
> issues tracking the client port if NAT's in play.
> The empty modes are just what Linux shows you when there's no
> fetchstatus data to show.

No NAT. Most of the internal network security related things are done
with some dirty ACL magic by our network gurus. If the traffic hits some
of their rules they're really fast when it comes to hit the bad guys.
I've asked them, but there was nothing special in the monitoring data.

I've hit the problem at least three times during the last two weeks. I
did reboot once. In two cases the problem disappeared after some time.
And yesterday my client didn't recognize a change in a file. I had to 
call 'fs flushvolume'.

I'm using this client version (1.4.5) with different kernel versions
since it was released and never saw a problem. Until three weeks ago.
And it happened with two different kernel and kmod versions.

Perhaps I'm able to trace the problem the next time. I can look on both 
sides, but I need some hints how to do it.


Berthold Cogel