[OpenAFS-port-darwin] startup cache scan hang
Nicholas Riley
njriley@uiuc.edu
Sun, 27 May 2007 15:17:19 -0500
Hi,
Has anyone seen OpenAFS hang on startup, seemingly during a cache
scan? I experienced hangs on over 50% of our machines after both
recent security updates - even after waiting 15 minutes or more, /afs
doesn't mount. This is OpenAFS 1.4.4 on both Intel and PowerPC
machines (the problem seems a bit more prevalent on PowerPC). We
don't have any similar problems on Linux or Solaris.
Here's what the system log says:
May 27 14:42:24 bender kernel[0]: Starting AFS cache scan...
[...]
May 27 14:42:28 bender kernel[0]: [256] waiting for afs_osi_ctxtp
May 27 14:42:33 bender kernel[0]: [256] waiting for afs_osi_ctxtp
And there seem to be a bunch of zombie afsds around. The below
transcribed from the screen since my console/SSH connections hung
entirely shortly thereafter, so there may be a few errors in it.
USER PID %CPU %MEM VSZ RSS TT STAT STARTED TIME COMMAND
root 237 0.0 0.4 27692 2028 ?? U 2:42PM 0:00.24 /usr/sbin/afsd -afsdb -stat 10000 -dcache 2500 -daemons 5 -volumes 70 -dynroot -fakestat-all
root 255 0.0 0.0 0 0 ?? Z 31Dec69 0:00.00 (afsd)
root 256 0.0 0.0 27692 2028 ?? Us 2:42PM 0:00.00 /usr/sbin/afsd -afsdb -stat 10000 -dcache 2500 -daemons 5 -volumes 70 -dynroot -fakestat-all
root 252 0.0 0.0 0 0 ?? Z 31Dec69 0:00.00 (afsd)
root 253 0.0 0.0 0 0 ?? Z 31Dec69 0:00.00 (afsd)
root 254 0.0 0.0 0 0 ?? Z 31Dec69 0:00.00 (afsd)
rxdebug says:
Free packets: 130, packet reclaims: 0, calls: 60, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
1 threads are idle
rx stats: free packets 130, allocs 130, alloc-failures(rcv 0/0,send 0/0,ack 0)
greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0
packets read: data 60 ack 51 busy 0 abort 0 ackall 0 challenge 0 response 0 debug 8 params 0 unused 0 unused 0 unused 0 version 0
other read counters: data 60, ack 51, dup 0 spurious 0 dally 0
packets sent: data 51 ack 0 busy 0 abort 9 ackall 0 challenge 0 response 0 debug 0 params 0 unused 0 unused 0 unused 0 version 0
other send counters: ack 0, data 102 (not resends), resends 0, pushed 0, acked&ignored 0
(these should be small) sendFailed 0, fatalErrors 0
3 server connections, 0 client connections, 3 peer structs, 3 call structs, 0 free call structs
Done.
and cmdebug returns none_waiting for everything.
I've been thinking of simply blowing away the cache directory before
starting AFS - would that be likely to help? Is there any other info
that's useful in diagnosing the problem?
--
Nicholas Riley <njriley@uiuc.edu> | <http://www.uiuc.edu/ph/www/njriley>