[OpenAFS] Problems with fsck on Solaris 9

Fri, 5 Nov 2004 10:52:28 -0500 (EST)

Thanks for working on this.  Is there a solution yet?  I have a development
machine (solaris 9, openafs 1.2.11) which I patched last night (before
reading the archives--doh!) and it appears to have the same, or a similar,
problem (it was fine before applying the newest patches):

The system is coming up.  Please wait.
The /vicepa file system (/dev/rdsk/c1t0d0s0) is being checked.
----Open AFS (R) openafs 1.2.11 fsck----
/dev/rdsk/c1t0d0s0: /dev/rdsk/c1t0d0s0: BAD SUPER BLOCK: VALUES IN SUPER BLOCK D
ISAGREE WITH THOSE IN FIRST ALTERNATE

/dev/rdsk/c1t0d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.

WARNING - Unable to repair one or more of the following filesystem(s):
        /dev/rdsk/c1t0d0s0
Run fsck manually (fsck filesystem...).
Exit the shell when done to continue the boot process.

(using an alternate superblock doesn't work either).

While a fix for fsck would be great, if anyone knows exactly which patch to
back-out of, please let me know.  If not, I'll be glad to start trying the
likely suspects--I just don't want to duplicate effort.

Cheers,
Stephen

On Tue, 2 Nov 2004, Douglas E. Engert wrote:

> I think I found the cause of fsck problem. I am concernd that if
> the fsck is run against the cooked device rather then the raw
> device, it could actually cause damage, rather then doing nothing
> and failing.
>
> Solaris 9 in ufs_fs.h changes fsbtodb:
>
>   #ifdef KERNEL
>   #define fsbtodb(fs, b)  (((daddr_t)(b)) << (fs)->fs_fsbtodb)
>   #else /* KERNEL */
>   #define fsbtodb(fs, b)  (((diskaddr_t)(b)) << (fs)->fs_fsbtodb)
>   #endif /* KERNEL */
>
> Previous versions had:
>
>   #define fsbtodb(fs, b)  ((b) << (fs)->fs_fsbtodb)
>
> Note the type cast to diskaddr_t which is a long long.
>
> The vfsck/setup.c uses this in calls to bread in src/utilities.c
> But bread is expecting a daddr_t which is a long.
>
> Thus the mismatch between. There is no common declaration
> of bread for the compiler to catch the mismatch.
>
> This causes a read to fail with the wrong address and wrong length
> and fsck to not do anything usefull.
>
> The mismatch need to be fixed. A related poblem is that Solaris
> fsck is using large file  support, but the AFS vfsck is not.
>
> This was found using truss on an empty file system, running
> the Solaris fsck and the AFS vfsck.
>
> I will be looking at a fix later today.
>
> Derrick J Brashear wrote:
> > On Sun, 31 Oct 2004, Brian Sebby wrote:
> >
> >> # fsck /vicepa
> >> ----Open AFS (R) openafs 1.2.11 fsck----
> >> ** /dev/rdsk/c0t9d0s0
> >>
> >> CANNOT READ: BLK 0
> >> CONTINUE? [yn] y
> >
> >
> > fsck the cooked device (/dev/dsk/c0t9d0s0). you may need to use a
> > wrapper or to patch vfsck to do it.
>
> That apears to cover up the problem, as it will still read the wrong
> block, but with any length.  When using the raw device, the length
> has to be a multiple of the block size which it was not because it
> was the wrong length which caused the failure.
>
> Running it this way could cause damage later if the blocks where written
> to the wrong locations.
>
> >
> > you should have mentioned this was the error the other night, it would
> > have jogged my memory
> >
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> >
> >
> >
>
> --
>
>   Douglas E. Engert  <DEEngert@anl.gov>
>   Argonne National Laboratory
>   9700 South Cass Avenue
>   Argonne, Illinois  60439
>   (630) 252-5444