[OpenAFS-devel] anyone tried linux > 2.4.14?

Neulinger, Nathan nneul@umr.edu
Tue, 27 Nov 2001 11:08:24 -0600


I'm getting mysterious afs lockups with 2.4.15 or 2.4.16. Nothing gets put
in D state, the processes just appear to hang.

It looks to me like the process is spinning in a read and sucking cpu like
mad, but the syscall is never completing. Strace does no output. The process
is definately not killable. 

Interestingly, other afs accesses on the machine appear fine, except if you
try to ls the same directory that was being read when the process hung, it
will hang as well, so it's possible it's a directory read that is failing,
but it appears not:

Here is the tail end of an strace:


806   fstat64(3, {st_dev=makedev(0, 9), st_ino=2050626764,
st_mode=S_IFREG|0755, st_nlink=1, st_uid=1, st_gid=0, st_blksize=4096,
st_blocks=12, st_size=6083, st_atime=2001/10/26-09:24:37,
st_mtime=2001/10/26-09:24:37, st_ctime=2001/10/26-09:24:37}) = 0
806   close(3)                          = 0
806
lstat64("/umr/s/openafs/.oldfiles/openafs/src/WINNT/doc/install/Documentatio
n/ja_JP/html/CmdRef/auarf131.htm", {st_dev=makedev(0, 9), st_ino=2050626766,
st_mode=S_IFREG|0755, st_nlink=1, st_uid=1, st_gid=0, st_blksize=4096,
st_blocks=18, st_size=8538, st_atime=2001/10/26-09:24:37,
st_mtime=2001/10/26-09:24:37, st_ctime=2001/10/26-09:24:37}) = 0
806
open("/umr/s/openafs/.oldfiles/openafs/src/WINNT/doc/install/Documentation/j
a_JP/html/CmdRef/auarf131.htm", O_RDONLY|O_LARGEFILE) = 3
806   write(1,
"umr/s/openafs/.oldfiles/openafs/src/WINNT/doc/install/Documentation/ja_JP/h
tml/CmdRef/auarf131.htm\n", 99) = 99
806   read(3,


Note the hang in the read. If I reboot and restart the tar, it will hang
somewhere else, usually further along.

This is on a clean 2.4.16 with only tiny local patches that shouldn't have
any impact. (Console blanking, semopm, and numfiles). It is configured with
highmem-4gb support, and is a SMP kernel, but on a UP box.

I'm going to try a UP kernel and see if that makes any difference... (We've
been trying to standardize on a single kernel build. Makes managing lots of
machines easier.)

Any ideas?

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216