[OpenAFS] Strange regular afs failure

Frank Burkhardt fbo2@gmx.net
Mon, 24 Sep 2007 09:42:13 +0200


Hi,

an afs client of mine does some cron job on a regular basis (once per 5
minutes) which involves reading from and writing to a single afs volume.

Every monday Morning ~ 7:30 the job failes with IO errors. Client logs
shows several "kernel: afs: failed to store file (5)" messages, FileLog
on the volumes Fileserver shows this:

 Mon Sep 24 07:33:30 2007 FindClient: stillborn client 8221900(1ef6f034); conn 823f0d0 (host 10.0.54.228:7001) had client 8221c48(1ef6f034)
 Mon Sep 24 07:33:30 2007 FindClient: stillborn client 82215b8(1ef6f03c); conn 823fd80 (host 10.0.54.228:7001) had client 8221900(1ef6f03c)
 Mon Sep 24 07:33:30 2007 FindClient: stillborn client 8220fd0(1ef6f028); conn 823d0f0 (host 10.0.54.228:7001) had client 82215b8(1ef6f028)

The fileservers is set to automatic restart at 01:45 the same day which
means, the job ran several times successfully before it failed after the
restart. Restart times of my DB-servers are set to sunday morning.

I checked the network - client and server are connected via a single switch
which is managed and doesn't show any log entry for at least 1 hour around
the event. I can also rule out other cron jobs on client and server - none
of them runs near 07:30 .

The only timely related event is one of our NFS-servers's restart which is
done on a regular basis. The NFS server returned seconds before the
afs-failure:

 Sep 24 06:09:06 hagen kernel: nfs: server helena not responding, still trying
 [...]
 Sep 24 07:33:27 hagen kernel: nfs: server helena OK
 Sep 24 07:33:33 hagen kernel: afs: failed to store file (5)

What do the logentries on the AFS server mean? Does anyone have an idea,
where to look for the cause of the problem?

Regards,

Frank