[OpenAFS] hung volume
Jeremy Mates
jmates@sial.org
Wed, 24 Aug 2005 13:00:22 -0700
I have inherited an AFS cell with two main servers, ~35 clients, mix of
RedHat 9 and RHEL3 systems, OpenAFS 1.2.13. Without any cause I can
determine, a particular directory now hangs all commands: chdir into
the directory works, anything else (ls, for instance) hangs, unkillable
with -KILL.
The server the volume lives on is running, and there are no space nor
quota limits I can see being hit. Another volume in the same vice
partition does not exhibit this problem. Information on the volume:
$ vos examine project.egp
project.egp 536871014 RW 378629 K On-line
server.example.edu /vicepae
RWrite 536871014 ROnly 0 Backup 536871016
MaxQuota 0 K
Creation Fri May 2 00:53:06 2003
Last Update Tue Aug 23 10:24:58 2005
13193 accesses in the past day (i.e., vnode references)
RWrite: 536871014 Backup: 536871016
number of sites -> 1
server server.example.edu partition /vicepae RW Site
Hung clients show write_locked locks:
$ cmdebug client
** Cache entry @ 0xf8ff26c8 for 1.536874424.1.1 [nick.example.edu]
locks: (none_waiting, write_locked(pid:20849 at:54))
2048 bytes DV 165 refcnt 1
callback f6c84b00 expires 1124843340
0 opens 0 writers
volume root
states (0x0)
Rebooting all the hung clients then restarting the AFS server still
yeilds a hang when trying to use the troublesome directory.