[OpenAFS] hung volume

Jeremy Mates jmates@sial.org
Wed, 24 Aug 2005 13:00:22 -0700

I have inherited an AFS cell with two main servers, ~35 clients, mix of
RedHat 9 and RHEL3 systems, OpenAFS 1.2.13. Without any cause I can
determine, a particular directory now hangs all commands: chdir into
the directory works, anything else (ls, for instance) hangs, unkillable
with -KILL.

The server the volume lives on is running, and there are no space nor
quota limits I can see being hit. Another volume in the same vice
partition does not exhibit this problem. Information on the volume:

$ vos examine project.egp
project.egp                       536871014 RW     378629 K  On-line
    server.example.edu /vicepae
    RWrite  536871014 ROnly          0 Backup  536871016
    MaxQuota          0 K
    Creation    Fri May  2 00:53:06 2003
    Last Update Tue Aug 23 10:24:58 2005
    13193 accesses in the past day (i.e., vnode references)

    RWrite: 536871014     Backup: 536871016
    number of sites -> 1
       server server.example.edu partition /vicepae RW Site

Hung clients show write_locked locks:

$ cmdebug client
** Cache entry @ 0xf8ff26c8 for 1.536874424.1.1 [nick.example.edu]
    locks: (none_waiting, write_locked(pid:20849 at:54))
    2048 bytes  DV 165 refcnt 1
    callback f6c84b00   expires 1124843340
    0 opens     0 writers
    volume root
    states (0x0)

Rebooting all the hung clients then restarting the AFS server still
yeilds a hang when trying to use the troublesome directory.