[OpenAFS] OpenAFS 1.2.4 client hangs

John Godehn godehn@email.unc.edu
Wed, 10 Jul 2002 14:53:12 -0400 (EDT)


Hi,

I've got some dual processor Athlon boxes running a 2.4.9-31SGI_XFS_1.1smp
kernel and OpenAFS 1.2.4 on a ext2 disk cache. Every once and a while the
AFS client hangs on these machines, in fact one of them hung this morning.
And by hang, I mean that any process that tries to access the AFS filespace
hangs (well actually I'm not sure if it the whole filespace or just the
volume that cmdebug reports as locked, in this case the hung volume was
root.afs.readonly).

When I run a cmdebug against the host, I get:

** Cache entry @ 0xf8a42000 for 1.536870913.1.1
    locks: (writer_waiting, 12 read_locks(pid:18968), 2 waiters)
    2048 bytes  DV 68 refcnt 1
    callback ebd02880   expires 1026244170
    0 opens     0 writers
    volume root
    states (0x4), read-only


Volumee 536870913 is my read-only root.afs volume. I also got the output
of cmdebug -long and kdump and stuck them on the web at:

http://www.unc.edu/~godehn/afs-problem/

I also tried to get a fstrace log with the following commands:

/usr/vice/etc/fstrace clear cm
/usr/vice/etc/fstrace setlog cmfx -buffers 100
/usr/vice/etc/fstrace sets cm -active
/usr/vice/etc/fstrace dump -follow cmfx -file /var/tmp/fstraceLog

In another window, I tried doing a 'ls /afs', and as expected the
ls process hung, but I also did not get any output to /var/tmp/fstraceLog.

I know that there are some deadlock issues that are fixed in 1.2.5 and some
more that are supposed to be fixed in 1.2.6, but I was wondering if someone
could tell me whether this looked like one of those known issues, or
something else. It you need some more debugging to be done, please let me
know.

Thanks so much,

John