[OpenAFS] Volumes going offline, needing salvage?

Joseph Di Lellio joed@ucsc.edu
Fri, 13 Oct 2006 16:32:45 -0700 (PDT)

   I have a problem with a new AFS filserver I've set up.  It is not
yet a major issue, but I strongly suspect it can be.

   Our architecture is such that we have three fileservers and three
DB servers.  I'm still migrating from TransArc to OpenAFS, after a
fashion.  My old DB servers are gone, replaced by OpenAFS servers.
I have 3 TransArc fileservers (temporarily) and 3 OpenAFS fileservers,
with the idea that we'll me vos moving things over.  Once done, we'll
retire the old TransArc servers entirely.

   Two of the OpenAFS fileservers, which show no problems, have fairly
small number of volumes on them.  On the remaining one, I've completed
migration of the first TransArc fileserver's volumes to it.  I have
two partitions, vicepa & vicepb, with ~8k volumes per partition.  Space
is running at about 21-22% used.

   I noticed a couple of volumes offline shortly after the move was
done.  I was able to deal with all but one - that one, even once I ran
salvage on it, still gave me a "no such file" when mounted & I tried to
take a look at it.  It was an old test volume, so not directly a concern.

   However, I am still, slowly, finding volumes offline & unable to be
attached.  I had none this morning.  Now, I have three.  This concerns
me greatly, and I've put a stop to migrations until I can find out what
is going on.

   Does anyone have any ideas on what the issue(s) might be?  The logs
have given me some bits, but mostly the obvious ones like needing to
run salvage.

It ain't what you don't know that gets you into trouble.  It's what you
know for sure that just ain't so.		-- Mark Twain