[OpenAFS] solaris 10 versions supporting inode fileservers

Douglas E. Engert deengert@anl.gov
Thu, 14 May 2009 09:31:16 -0500


This sounds like a 1.4.8 vs 1.4.10 issue and may not be
Solaris related.

David R Boldt wrote:
> 
> We use Solaris 10 SPARC exclusively for our AFS servers.
> After upgrading to 1.4.10 from 1.4.8 we had a very few
> volumes that started spontaneously going off-line, recovering,
> and then going off-line again until they needed to be salvaged.

I am assuming you compile the inode versions yourself as the OpenAFS
1.4.8 and 1.4.10 releases for Solaris 10 were all compiled with namei.

> 
> Hearing that this might be related to inode, we moved these
> volumes to a set of little use fileservers that were running
> namei at 1.4.10. It made no discernible difference.

So this may not be a namei vs inode issue.

> 
> Two volumes in particular accounted for >90% of our off-line
> volume issues.
> 
> FileLog:
> Mon Apr 27 10:56:09 2009 Volume 2023867468 now offline, must be salvaged.
> Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
> Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
> Mon Apr 27 10:56:22 2009 fssync: volume 2023867469 restored; breaking 
> all call backs
> (restored vol above being R/O for R/W in need of salvage)
> 
> Both of the volumes most frequently impacted have content
> completely rewritten roughly every 20 minutes while being on
> an automated replication schedule of 15 minutes. One of them
> 25MB, the other 95MB, both at about 80% quota.

How log does the replication take?

> 
> We downgraded just the fileserver binary to 1.4.8 on all of
> our servers and have not seen a single off-line message in
> 36 hours.


> 
> 
>                                         -- David Boldt
>                                         <dboldt@usgs.gov>

-- 

  Douglas E. Engert  <DEEngert@anl.gov>
  Argonne National Laboratory
  9700 South Cass Avenue
  Argonne, Illinois  60439
  (630) 252-5444