[OpenAFS] solaris 10 versions supporting inode fileservers
Douglas E. Engert
deengert@anl.gov
Thu, 14 May 2009 09:31:16 -0500
This sounds like a 1.4.8 vs 1.4.10 issue and may not be
Solaris related.
David R Boldt wrote:
>
> We use Solaris 10 SPARC exclusively for our AFS servers.
> After upgrading to 1.4.10 from 1.4.8 we had a very few
> volumes that started spontaneously going off-line, recovering,
> and then going off-line again until they needed to be salvaged.
I am assuming you compile the inode versions yourself as the OpenAFS
1.4.8 and 1.4.10 releases for Solaris 10 were all compiled with namei.
>
> Hearing that this might be related to inode, we moved these
> volumes to a set of little use fileservers that were running
> namei at 1.4.10. It made no discernible difference.
So this may not be a namei vs inode issue.
>
> Two volumes in particular accounted for >90% of our off-line
> volume issues.
>
> FileLog:
> Mon Apr 27 10:56:09 2009 Volume 2023867468 now offline, must be salvaged.
> Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
> Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
> Mon Apr 27 10:56:22 2009 fssync: volume 2023867469 restored; breaking
> all call backs
> (restored vol above being R/O for R/W in need of salvage)
>
> Both of the volumes most frequently impacted have content
> completely rewritten roughly every 20 minutes while being on
> an automated replication schedule of 15 minutes. One of them
> 25MB, the other 95MB, both at about 80% quota.
How log does the replication take?
>
> We downgraded just the fileserver binary to 1.4.8 on all of
> our servers and have not seen a single off-line message in
> 36 hours.
>
>
> -- David Boldt
> <dboldt@usgs.gov>
--
Douglas E. Engert <DEEngert@anl.gov>
Argonne National Laboratory
9700 South Cass Avenue
Argonne, Illinois 60439
(630) 252-5444