[OpenAFS] Volumes going offline, needing salvage?

Joseph Di Lellio joed@ucsc.edu
Fri, 13 Oct 2006 18:39:22 -0700 (PDT)


Yes, I hit send rather earlier than intended.  Apologies.

OpenAFS (v3.6): Sunfire V240 running Solaris 9 (118558-10)
TransArc (v1.4.1): V220R running Solaris 8 (108528-09)

There is nothing unusual in the system log files.  This is the
case for all of the above, and the decom'ed DB servers.

When I say unusual, I mean I still find the usual messages from
reboots & such.

The new fileservers are using a StorEdge 3510.  It has been configured
as one 10 drive RAID 5 LUN which is divided into 3 equal partitions.
There are 2 global spare drives.

The old servers are using A1000s.

The version of OpenAFS is the same across all DB & fileservers, and
ditto for the TransArc systems.

Filesystem type for /vicep* partitions are afs, with nologging.  There
does not seem to be a correlation between the number of volumes & which
of the two partitions I have.

The entry:

afs                     65

exists in the /etc/name_to_sysnum on all systems.

The volumes affected thus far are RW & BK.  I have seen one situation
where both of these have had problems.  Others have had one or the other.

I have a fairly limited number of RO volumes, so this may simply be
statistics.

The example below is a RW volume.  I have not seen differences between
the volume types, including the combined instance above.

An example of an entry from FileLog:
Fri Oct 13 16:11:30 2006 VAttachVolume: volume salvage flag is ON for
/vicepa//V0536976316.vol; volume needs salvage

And from VolserLog:
Fri Oct 13 16:11:30 2006 VAttachVolume: volume salvage flag is ON for
/vicepa/V0536976316.vol; volume needs salvage
Fri Oct 13 16:11:30 2006 1 Volser: XListOneVolume: Could not attach volume
536976316

Looking back to the previous evening, I see a lot of the following:

Thu Oct 12 23:57:20 2006 VAttachVolume: Error reading diskDataHandle vol
header /vicepb/V0537103682.vol; error=101
Thu Oct 12 23:57:20 2006 VAttachVolume: Error attaching volume
/vicepb/V0537103682.vol; volume needs salvage; error=101

Thu Oct 12 23:58:30 2006 VAttachVolume: Error reading smallVnode vol
header /vicepb/V0536957965.vol; error=101
Thu Oct 12 23:58:30 2006 VAttachVolume: Error attaching volume
/vicepb/V0536957965.vol; volume needs salvage; error=101

>From BosLog and the previous night, a number of these:
Fri Oct 13 01:45:41 2006: fs:vol exited on signal 6 (core dumped)

Although I see nothing similar on the other fileservers, they are
otherwise identical.

Apologies again for not including this earlier.  Your assistance is
greatly appreciated.

If there is anything else I'm missing, please let me know & I'll find
it as quickly as I can.

On Fri, 13 Oct 2006, Jeffrey Hutzelman wrote:

>
>
> On Friday, October 13, 2006 04:32:45 PM -0700 Joseph Di Lellio
> <joed@ucsc.edu> wrote:
>
> >    Does anyone have any ideas on what the issue(s) might be?  The logs
> > have given me some bits, but mostly the obvious ones like needing to
> > run salvage.
>
> You're going to need to provide some more details.
> What operating system and version are the fileservers?
> What OpenAFS version?
> What filesystem type are the vice partitions?
> Are the affected volumes RW, RO, or BK?
> What are the relevant log messages?
>
> -- Jeff
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>

------
It ain't what you don't know that gets you into trouble.  It's what you
know for sure that just ain't so.		-- Mark Twain