[OpenAFS] corruption problems
Nicolescu, Edward L
edward@bnl.gov
Thu, 25 Nov 2004 17:27:24 -0500
Folks,
The problem I am about to describe occurs on any afs client/server
combination, whether it's OpenAFS or IBM AFS (on either the server or the
client) . It also occurs on a variety of OS's (Linux, Solaris, AIX, no
matter if they run on the client or the server).
The affected area is /afs/.rhic.bnl.gov/i386_sl302/opt/star. Please note
the "." (dot), indicating the path to the RW version of the volume. All the
other underlying volumes are, also, replicated. Everything seems to be fine
up to the point when I want to create or delete a file. This causes the
number of accesses to the volume to spike up from an average of about 3/sec
to about 250/sec. The command to create/remove the file hangs and, most of
the times, the volume ends up corrupted. Running the salvager causes the
hanging create/remove command to complete successfully. Interesting enough,
running a "vos release" had the same effect, causing the hanging command to
complete. Invariably, the problems reported by the salvager are of this
type:
Vnode "N": version < inode version; fixed (old status)
where "N" differs from one salvaging to another.
The problem dissapears if I rename the
/afs/.rhic.bnl.gov/i386_sl302/opt/star mount
point to something else, for example
/afs/.rhic.bnl.gov/i386_sl302/opt/tmp.star. Then, creating or deleting
files is possible again.
One more thing: the applications accessing this area are not writing to it.
Ideas, suggestions will be much appreciated. Thank you.
Edward Nicolescu