[OpenAFS] Problems with fsck on Solaris 9

Stephen Joyce stephen@physics.unc.edu
Thu, 11 Nov 2004 18:38:08 -0500 (EST)


Doug (and anyone else w/ knowledge wrt Solaris),

I've still got fsck problems on solaris 9... I downloaded 1.2.13, but it
doesn't appear to have Doug's patches or the solaris interleave patch
applied... so I retrieved the patches from CVS, applied them, and built
from source (without problems).

However I'm still getting the following error:
----Open AFS (R) openafs 1.2.13 fsck----
** /dev/rdsk/c1t0d0s3
BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
USE AN ALTERNATE SUPER-BLOCK TO SUPPLY NEEDED INFORMATION;
eg. fsck [-F ufs] -o b=# [special ...]
where # is the alternate super block. SEE fsck_ufs(1M).

This is the same error I was getting previously.  I have newfs'ed all of
the partitions, and on reboot the system pronounced the partitions OK..
however after creating a single new volume, subsequent reboots exhibit the
same fsck error.

Interestingly, if I refrain from mounting the drives at boot-time, mount
them manually, and restart the fileserver, the partition looks OK and
the data intact.  Running solaris' /usr/lib/fs/ufs/fsck on one of the
(empty) partitions -- yes, I know it destroys any data present --
pronounces the filesystem clean.

Assuming that there's nothing unique(*) about my circumstances, and my
hardware is not failing in subtle ways, it seems that the disk is
actually OK and openafs' fsck is still confused.  Or is it possible I'm
overlooking some other change?

Any help is appreciated.

> uname -v
Generic_117171-11

(*) My /vicepX partitions are on an external promise raid array.  The total
disk size is 1.3TB, divided into (7) 200GB partitions.  No errors are
apparent and it appears to function normally when used as a plain UFS disk.

Cheers,
Stephen

If voting could really change things, it would be illegal.

On Fri, 5 Nov 2004, Douglas E. Engert wrote:

> I sent in a bug report and patch on 11/2 See bug 15927.
> Basicly it adds a prototype for bread and bwrite into fsck.h
>
> You may also need the patch to the src/vfsck/setup.c  added to the CVS in August to
> get it to compile on Solare 9 if the sys/fs/ufs_fs.h does has been updated
>
>
>
> Stephen Joyce wrote:
> > Thanks for working on this.  Is there a solution yet?  I have a development
> > machine (solaris 9, openafs 1.2.11) which I patched last night (before
> > reading the archives--doh!) and it appears to have the same, or a similar,
> > problem (it was fine before applying the newest patches):
> >
> >
> > The system is coming up.  Please wait.
> > The /vicepa file system (/dev/rdsk/c1t0d0s0) is being checked.
> > ----Open AFS (R) openafs 1.2.11 fsck----
> > /dev/rdsk/c1t0d0s0: /dev/rdsk/c1t0d0s0: BAD SUPER BLOCK: VALUES IN SUPER BLOCK D
> > ISAGREE WITH THOSE IN FIRST ALTERNATE
> >
> > /dev/rdsk/c1t0d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> >
> > WARNING - Unable to repair one or more of the following filesystem(s):
> >         /dev/rdsk/c1t0d0s0
> > Run fsck manually (fsck filesystem...).
> > Exit the shell when done to continue the boot process.
> >
> >
> > (using an alternate superblock doesn't work either).
> >
> > While a fix for fsck would be great, if anyone knows exactly which patch to
> > back-out of, please let me know.  If not, I'll be glad to start trying the
> > likely suspects--I just don't want to duplicate effort.
> >
> > Cheers,
> > Stephen
> >
> >
> > On Tue, 2 Nov 2004, Douglas E. Engert wrote:
> >
> >
> >>I think I found the cause of fsck problem. I am concernd that if
> >>the fsck is run against the cooked device rather then the raw
> >>device, it could actually cause damage, rather then doing nothing
> >>and failing.
> >>
> >>Solaris 9 in ufs_fs.h changes fsbtodb:
> >>
> >>  #ifdef KERNEL
> >>  #define fsbtodb(fs, b)  (((daddr_t)(b)) << (fs)->fs_fsbtodb)
> >>  #else /* KERNEL */
> >>  #define fsbtodb(fs, b)  (((diskaddr_t)(b)) << (fs)->fs_fsbtodb)
> >>  #endif /* KERNEL */
> >>
> >>Previous versions had:
> >>
> >>  #define fsbtodb(fs, b)  ((b) << (fs)->fs_fsbtodb)
> >>
> >>Note the type cast to diskaddr_t which is a long long.
> >>
> >>The vfsck/setup.c uses this in calls to bread in src/utilities.c
> >>But bread is expecting a daddr_t which is a long.
> >>
> >>Thus the mismatch between. There is no common declaration
> >>of bread for the compiler to catch the mismatch.
> >>
> >>This causes a read to fail with the wrong address and wrong length
> >>and fsck to not do anything usefull.
> >>
> >>The mismatch need to be fixed. A related poblem is that Solaris
> >>fsck is using large file  support, but the AFS vfsck is not.
> >>
> >>This was found using truss on an empty file system, running
> >>the Solaris fsck and the AFS vfsck.
> >>
> >>I will be looking at a fix later today.
> >>
> >>Derrick J Brashear wrote:
> >>
> >>>On Sun, 31 Oct 2004, Brian Sebby wrote:
> >>>
> >>>
> >>>># fsck /vicepa
> >>>>----Open AFS (R) openafs 1.2.11 fsck----
> >>>>** /dev/rdsk/c0t9d0s0
> >>>>
> >>>>CANNOT READ: BLK 0
> >>>>CONTINUE? [yn] y
> >>>
> >>>
> >>>fsck the cooked device (/dev/dsk/c0t9d0s0). you may need to use a
> >>>wrapper or to patch vfsck to do it.
> >>
> >>That apears to cover up the problem, as it will still read the wrong
> >>block, but with any length.  When using the raw device, the length
> >>has to be a multiple of the block size which it was not because it
> >>was the wrong length which caused the failure.
> >>
> >>Running it this way could cause damage later if the blocks where written
> >>to the wrong locations.
> >>
> >>
> >>>you should have mentioned this was the error the other night, it would
> >>>have jogged my memory
> >>>
> >>>_______________________________________________
> >>>OpenAFS-info mailing list
> >>>OpenAFS-info@openafs.org
> >>>https://lists.openafs.org/mailman/listinfo/openafs-info
> >>>
> >>>
> >>>
> >>
> >>--
> >>
> >>  Douglas E. Engert  <DEEngert@anl.gov>
> >>  Argonne National Laboratory
> >>  9700 South Cass Avenue
> >>  Argonne, Illinois  60439
> >>  (630) 252-5444
> >
> >
> >
> >
>
> --
>
>   Douglas E. Engert  <DEEngert@anl.gov>
>   Argonne National Laboratory
>   9700 South Cass Avenue
>   Argonne, Illinois  60439
>   (630) 252-5444
>