[OpenAFS] AFS client nodes crash

Eric Chris Garrison ecgarris@iupui.edu
Mon, 28 Apr 2008 14:01:12 -0400


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hello,

I have an odd problem going on.  I am an admin for a major AFS
installation at a university in the Midwest.  We have a couple of
supercomputers acting as AFS clients, but it seems like there's an
intermittent problem where operations in AFS crash a node.

The logs on the clients seem to indicate something to do with a failure to
write to an inode in the cache before the crash. Files are written
successfully to the data subdirectories of the cache, however.  The cache
sits in /tmp, but is on a simple ext3 filesystem, so nothing is funny there.

The error we see always points to the same line in the same
file in the OpenAFS source, osi_file.c, line 71:

openafs: Can't open inode 1952804
- ----------- [cut here ] --------- [please bite here ] ---------
Kernel BUG at osi_file:71

Also, the inode referenced in the error messages always can be found
somewhere in the AFS cache directory tree.

We're running openafs-1.4.4 client on RHEL 4 on Power and x86_64.  The
nodes also run Lustre and GPFS at the same time as AFS.  One theory is
that one or the other of those may be interfering at a kernel level, but
that's really hard to prove at the moment.

The kernel we run on these machines is the lustre-patched
2.6.9 kernel. Here's some information from one of the systems:

# uname -a
Linux b001 2.6.9-55.EL_lustre-1.4.1.quarry1 #1 SMP Tue Oct 2 08:54:17 EDT
2007 x86_64 x86_64 x86_64 GNU/Linux

# rpm -qa | grep openafs
openafs-1.4.4-rhel4.2
openafs-docs-1.4.4-rhel4.2
openafs-kernel-source-1.4.4-rhel4.2
openafs-client-1.4.4-rhel4.2
openafs-krb5-1.4.4-rhel4.2
openafs-authlibs-1.4.4-rhel4.2
# rpm -qa | grep gpfs
gpfs.msg.en_US-3.1.0-13
gpfs.base-3.1.0-13
gpfs.gpl-3.1.0-13
gpfs.docs-3.1.0-13

One other odd thing is is that it seems like not all the nodes have a
/usr/vice/cache/VolumeItems file at all, it was never created it seems,
and others have a zero length file there.  I don't know if this is
related, however.

Thanks for any help!

Chris
- --
Eric Chris Garrison             | Principal Mass Storage Specialist
ecgarris@iupui.edu              | Indiana University - Research Storage
W: 317-278-1207 M: 317-250-8649 | Jabber IM: ecgarris@iupui.edu
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFIFhDoDDNrAMkf9LERAvseAJ9pJzSxb52TnnVLGpf/yZ51TUUoUwCfW9pC
mwJfpPsKjsXITrD+dSRP1As=
=RDch
-----END PGP SIGNATURE-----