[OpenAFS] corruption problems

Nicolescu, Edward L edward@bnl.gov
Thu, 25 Nov 2004 17:27:24 -0500


Folks,

The problem I am about to describe occurs on any afs client/server 
combination, whether it's OpenAFS or IBM AFS (on either the server or the
client) . It also occurs on a variety of OS's (Linux, Solaris, AIX, no 
matter if they run on the client or the server).

The affected area is /afs/.rhic.bnl.gov/i386_sl302/opt/star. Please note 
the "." (dot), indicating the path to the RW version of the volume. All the
other underlying volumes are, also, replicated. Everything seems to be fine
up to the point when I want to create or delete a file. This causes the 
number of accesses to the volume to spike up from an average of about 3/sec
to about 250/sec. The command to create/remove the file hangs and, most of 
the times, the volume ends up corrupted. Running the salvager causes the 
hanging create/remove command to complete successfully. Interesting enough,
running a "vos release" had the same effect, causing the hanging command to
complete. Invariably, the problems reported by the salvager are of this 
type:

Vnode "N": version < inode version; fixed (old status)

where "N" differs from one salvaging to another.

The problem dissapears if I rename the 
/afs/.rhic.bnl.gov/i386_sl302/opt/star mount
point to something else, for example 
/afs/.rhic.bnl.gov/i386_sl302/opt/tmp.star. Then, creating or deleting 
files is possible again.

One more thing: the applications accessing this area are not writing to it.
Ideas, suggestions will be much appreciated. Thank you.

Edward Nicolescu