[OpenAFS] Problems with fsck on Solaris 9
Stephen Joyce
stephen@physics.unc.edu
Fri, 5 Nov 2004 10:52:28 -0500 (EST)
Thanks for working on this. Is there a solution yet? I have a development
machine (solaris 9, openafs 1.2.11) which I patched last night (before
reading the archives--doh!) and it appears to have the same, or a similar,
problem (it was fine before applying the newest patches):
The system is coming up. Please wait.
The /vicepa file system (/dev/rdsk/c1t0d0s0) is being checked.
----Open AFS (R) openafs 1.2.11 fsck----
/dev/rdsk/c1t0d0s0: /dev/rdsk/c1t0d0s0: BAD SUPER BLOCK: VALUES IN SUPER BLOCK D
ISAGREE WITH THOSE IN FIRST ALTERNATE
/dev/rdsk/c1t0d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
WARNING - Unable to repair one or more of the following filesystem(s):
/dev/rdsk/c1t0d0s0
Run fsck manually (fsck filesystem...).
Exit the shell when done to continue the boot process.
(using an alternate superblock doesn't work either).
While a fix for fsck would be great, if anyone knows exactly which patch to
back-out of, please let me know. If not, I'll be glad to start trying the
likely suspects--I just don't want to duplicate effort.
Cheers,
Stephen
On Tue, 2 Nov 2004, Douglas E. Engert wrote:
> I think I found the cause of fsck problem. I am concernd that if
> the fsck is run against the cooked device rather then the raw
> device, it could actually cause damage, rather then doing nothing
> and failing.
>
> Solaris 9 in ufs_fs.h changes fsbtodb:
>
> #ifdef KERNEL
> #define fsbtodb(fs, b) (((daddr_t)(b)) << (fs)->fs_fsbtodb)
> #else /* KERNEL */
> #define fsbtodb(fs, b) (((diskaddr_t)(b)) << (fs)->fs_fsbtodb)
> #endif /* KERNEL */
>
> Previous versions had:
>
> #define fsbtodb(fs, b) ((b) << (fs)->fs_fsbtodb)
>
> Note the type cast to diskaddr_t which is a long long.
>
> The vfsck/setup.c uses this in calls to bread in src/utilities.c
> But bread is expecting a daddr_t which is a long.
>
> Thus the mismatch between. There is no common declaration
> of bread for the compiler to catch the mismatch.
>
> This causes a read to fail with the wrong address and wrong length
> and fsck to not do anything usefull.
>
> The mismatch need to be fixed. A related poblem is that Solaris
> fsck is using large file support, but the AFS vfsck is not.
>
> This was found using truss on an empty file system, running
> the Solaris fsck and the AFS vfsck.
>
> I will be looking at a fix later today.
>
> Derrick J Brashear wrote:
> > On Sun, 31 Oct 2004, Brian Sebby wrote:
> >
> >> # fsck /vicepa
> >> ----Open AFS (R) openafs 1.2.11 fsck----
> >> ** /dev/rdsk/c0t9d0s0
> >>
> >> CANNOT READ: BLK 0
> >> CONTINUE? [yn] y
> >
> >
> > fsck the cooked device (/dev/dsk/c0t9d0s0). you may need to use a
> > wrapper or to patch vfsck to do it.
>
> That apears to cover up the problem, as it will still read the wrong
> block, but with any length. When using the raw device, the length
> has to be a multiple of the block size which it was not because it
> was the wrong length which caused the failure.
>
> Running it this way could cause damage later if the blocks where written
> to the wrong locations.
>
> >
> > you should have mentioned this was the error the other night, it would
> > have jogged my memory
> >
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> >
> >
> >
>
> --
>
> Douglas E. Engert <DEEngert@anl.gov>
> Argonne National Laboratory
> 9700 South Cass Avenue
> Argonne, Illinois 60439
> (630) 252-5444