[OpenAFS] OpenAFS 1.2.13 and fsck problems on Solaris 9

Douglas E. Engert deengert@anl.gov
Thu, 02 Dec 2004 16:39:45 -0600


I did some more testing using OpenAFS 1.3.74 vfsck on a small partition
on a Solaris9 system to look at why it might fail. Here is a patch
to help debug the vfsck to get more debugging info if it fails.

There are two failure situations I ran into: A files system > 1Tb
and logging.

If a file system was created with the newfs -T option or mkfs_ufs -o mtb=y
this is creating a file system the can exceed 1Tb. This uses a different
magic number in the super block, and the OpenAFS fsck will fail with
"MAGIC NUMBER WRONG". It might take major changes to support this.

Logging allows for fast recovery. It can be turned on and off using the
mount command. See "man mount_ufs". Logging may also be turned on
for a large partition, IIRC some one said Solaris 10 will do this?

The fs_logbno filed of the first super block contains the "block # of the
embedded log". It appears to be only in the first super block.
The vfsck does not account for this difference when comparing the first
and alternate super blocks, and you can get the  BAD SUPER BLOCK:
VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE.

Note this field is saved over a mount and may even be used as a flag
to continue logging on subsequent mounts.

If I mounted the file system with -o nologging then unmount,
it appears to have cleared this field, and the OpenAFS vfsck works
as expected.

following is a patch to src/vfsck/setup.c for 1.3.74 to dump
the super block and if the memcmp fails dump the alternate super block
so you can see what was different. You might want to try this, to see
if it is the fs_logbno at offest 0x052c or some other problem.


--- ,setup.c	Wed Aug 25 02:22:22 2004
+++ setup.c	Thu Dec  2 13:41:09 2004
@@ -112,6 +112,23 @@
  char *malloc(), *calloc();
  struct disklabel *getdisklabel();

+static
+void hexdump(char * comment, void * in, int len)
+{
+    unsigned char *p = (unsigned char *)in;
+    int i;
+
+    fprintf(stderr,"%s",comment);
+    for (i=0; i<len; i++)
+    {
+        if ((i & 31) == 0) fprintf(stderr,"\n%06x ",i);
+        if ((i & 3) == 0) fprintf(stderr, " ");
+        /*if ((i & 31) == 0) fprintf(stderr,"\n    "); */
+        fprintf(stderr,"%02x", p[i]);
+    }
+    fprintf(stderr,"\n");
+}
+
  setup(dev)
       char *dev;
  {
@@ -632,6 +649,7 @@
      /*
       * run a few consistency checks of the super block
       */
+hexdump("main SB",(char *)&sblock, (int)sblock.fs_sbsize);
  #ifdef	AFS_HPUX_ENV
  #if defined(FD_FSMAGIC)
      if ((sblock.fs_magic != FS_MAGIC) && (sblock.fs_magic != FS_MAGIC_LFN)
@@ -752,7 +770,9 @@
  #if     defined(AFS_HPUX110_ENV)
      UpdateAlternateSuper(&sblock, &altsblock);
  #endif /* AFS_HPUX110_ENV */
-    if (memcmp((char *)&sblock, (char *)&altsblock, (int)sblock.fs_sbsize)) {
+	{
+	int mismatch = 0;
+    if (mismatch = memcmp((char *)&sblock, (char *)&altsblock, (int)sblock.fs_sbsize)) {
  #ifdef	__alpha
  	if (memcmp
  	    ((char *)&sblock.fs_blank[0], (char *)&altsblock.fs_blank[0],
@@ -760,9 +780,13 @@
  	    memset((char *)sblock.fs_blank, 0, sizeof(sblock.fs_blank));
  	} else {
  #endif /* __alpha */
+		/* dump for debugging the two blocks */
+		fprintf(stderr,"SB dont match= %d\n",mismatch);
+		hexdump("Alternate SB",(char *)&altsblock, (int)sblock.fs_sbsize);
  	    badsb(listerr,
  		  "VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE");
  	    return (0);
+	}
  #ifdef	__alpha
  	}
  #endif /* __alpha */



Andy Malato wrote:

> ! Date: Wed, 01 Dec 2004 08:06:05 -0600
> ! From: Douglas E. Engert <deengert@anl.gov>
> ! To: Andy Malato <andym@oak.njit.edu>
> ! Cc: openafs-info@openafs.org
> ! Subject: Re: [OpenAFS] OpenAFS 1.2.13 and fsck problems on Solaris 9
> !
> !
> !
> !
> !
> ! Andy Malato wrote:
> !
> ! > I have seen similar postings on this topic in the mail archives, however,
> ! > I don't know if this issue has been completly resolved.
> ! >
> ! > I am running OpenAFS 1.2.13 on Solaris 9 with kernel patch 117171-07.
> ! > According to what I have read in the archive postings it appears that Sun
> ! > has made some changes to the UFS data structures, which causes the OpenAFS
> ! > fsck to break.
> ! >
> ! > I get similar messages during each reboot :
> ! >
> ! >
> ! > checking ufs filesystems
> ! > ----Open AFS (R) openafs 1.2.13 fsck----
> ! > /dev/rdsk/c2t5d1s0: IMPOSSIBLE INTERLEAVE=0 IN SUPERBLOCK (FIXED)
> ! > /dev/rdsk/c2t5d1s0: is clean.
> ! > ----Open AFS (R) openafs 1.2.13 fsck----
> ! > /dev/rdsk/c3t5d0s0: IMPOSSIBLE INTERLEAVE=0 IN SUPERBLOCK (FIXED)
> ! > /dev/rdsk/c3t5d0s0: is clean.
> ! >
> !
> ! This is caused by the fsck checking the old lnterleave field
> ! that was replaced.
> !
> ! >
> ! > If I manually run /usr/lib/fs/afs/fsck -y against these devices the
> ! > problem appears to go away.  However, this still indicates that something
> ! > may be wrong and I can't help to have limited confidence in
> ! > /usr/lib/fs/afs/fsck should one of the vice partitions need to be
> ! > recovered via fsck after a system crash.
> ! >
> ! >
> ! > I discovered this posting :
> ! >
> ! > https://lists.openafs.org/pipermail/openafs-info/2004-November/015400.html
> ! >
> ! > After adding the required patches and rebuilding, I ran newfs on a few
> ! > vice partitions and rebooted, and got the following error message:
> ! >
> ! > checking ufs filesystems
> ! > ----Open AFS (R) openafs 1.2.13 fsck----
> ! > /dev/rdsk/c2t5d1s0: /dev/rdsk/c2t5d1s0: BAD SUPER BLOCK: VALUES IN SUPER
> ! > BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
> ! >
> ! > /dev/rdsk/c2t5d1s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> ! > ----Open AFS (R) openafs 1.2.13 fsck----
> ! > /dev/rdsk/c3t5d0s0: /dev/rdsk/c3t5d0s0: BAD SUPER BLOCK: VALUES IN SUPER
> ! > BLOCK DISAGREE WITH THOSE IN FIRST ALTERNATE
> ! >
> ! > /dev/rdsk/c3t5d0s0: UNEXPECTED INCONSISTENCY; RUN fsck MANUALLY.
> ! >
> !
> !
> !
> ! >
> ! > An attempt to run fsck manually produces this error :
> ! >
> ! > fsck -y /dev/rdsk/c2t5d1s0
> ! > ----Open AFS (R) openafs 1.2.13 fsck----
> ! > ** /dev/rdsk/c2t5d1s0
> ! > BAD SUPER BLOCK: VALUES IN SUPER BLOCK DISAGREE WITH THOSE IN FIRST
> ! > ALTERNATE
> ! > USE AN ALTERNATE SUPER-BLOCK TO SUPPLY NEEDED INFORMATION;
> ! > eg. fsck [-F ufs] -o b=# [special ...]
> ! > where # is the alternate super block. SEE fsck_ufs(1M).
> ! >
> ! >
> ! > I then found this posting :
> ! >
> ! > https://lists.openafs.org/pipermail/openafs-info/2004-November/015575.html
> ! >
> ! > I do have logging disabled on all the vice partitions by adding nologging
> ! > to the mount options in /etc/vfstab.
> ! >
> !
> ! Did the partition ever have logging turned on?
> 
> No, after turning it off, I had run newfs on the partition just to be
> sure.
> 
> !
> ! Can you debug or add some code to the  src/vfsck/setup.c
> !
> !   if (memcmp((char *)&sblock, (char *)&altsblock, (int)sblock.fs_sbsize)) {
> !
> ! to dump these two block to see where the difference is. It was speculated
> ! that the problem was with logging, but there might be some other error.
> ! I could send you some code later today.
> 
> 
> Yes, I'd be glad to.
> 
> !
> ! > I am at a loss here and perhaps I missed something.  I'd appreciate any
> ! > feedback that anyone can give regarding this matter.
> ! >
> !
> ! It was speculated that it was logging that caused the problem. There
> ! may be some other problem.
> 
> 
> Thanks for your prompt response and help on this matter.
> 
> 
> 
>         ---Andy
> 
> 
> 
> !
> ! >
> !
> ! > Thanks,
> ! >
> ! >
> ! >         ---Andy
> ! >
> ! >
> ! >
> ! >
> ! >
> ! > _______________________________________________
> ! > OpenAFS-info mailing list
> ! > OpenAFS-info@openafs.org
> ! > https://lists.openafs.org/mailman/listinfo/openafs-info
> ! >
> ! >
> ! >
> !
> ! --
> !
> !   Douglas E. Engert  <DEEngert@anl.gov>
> !   Argonne National Laboratory
> !   9700 South Cass Avenue
> !   Argonne, Illinois  60439
> !   (630) 252-5444
> ! _______________________________________________
> ! OpenAFS-info mailing list
> ! OpenAFS-info@openafs.org
> ! https://lists.openafs.org/mailman/listinfo/openafs-info
> !
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
> 
> 
> 

-- 

  Douglas E. Engert  <DEEngert@anl.gov>
  Argonne National Laboratory
  9700 South Cass Avenue
  Argonne, Illinois  60439
  (630) 252-5444