[OpenAFS] solaris 10 versions supporting inode fileservers

David R Boldt dboldt@usgs.gov
Wed, 13 May 2009 12:19:01 -0400


This is a multipart message in MIME format.
--=_alternative 0059A031852575B5_=
Content-Type: text/plain; charset="US-ASCII"

We use Solaris 10 SPARC exclusively for our AFS servers.
After upgrading to 1.4.10 from 1.4.8 we had a very few
volumes that started spontaneously going off-line, recovering,
and then going off-line again until they needed to be salvaged.

Hearing that this might be related to inode, we moved these
volumes to a set of little use fileservers that were running 
namei at 1.4.10. It made no discernible difference.

Two volumes in particular accounted for >90% of our off-line 
volume issues.

FileLog:
Mon Apr 27 10:56:09 2009 Volume 2023867468 now offline, must be salvaged.
Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.
Mon Apr 27 10:56:22 2009 fssync: volume 2023867469 restored; breaking all 
call backs 
(restored vol above being R/O for R/W in need of salvage)

Both of the volumes most frequently impacted have content 
completely rewritten roughly every 20 minutes while being on 
an automated replication schedule of 15 minutes. One of them 
25MB, the other 95MB, both at about 80% quota.

We downgraded just the fileserver binary to 1.4.8 on all of 
our servers and have not seen a single off-line message in 
36 hours.


                                         -- David Boldt
                                         <dboldt@usgs.gov>

--=_alternative 0059A031852575B5_=
Content-Type: text/html; charset="US-ASCII"


<br><tt><font size=1>We use Solaris 10 SPARC exclusively for our AFS servers.</font></tt>
<br><tt><font size=1>After upgrading to 1.4.10 from 1.4.8 we had a very
few</font></tt>
<br><tt><font size=1>volumes that started spontaneously going off-line,
recovering,</font></tt>
<br><tt><font size=1>and then going off-line again until they needed to
be salvaged.</font></tt>
<br>
<br><tt><font size=1>Hearing that this might be related to inode, we moved
these</font></tt>
<br><tt><font size=1>volumes to a set of little use fileservers that were
running </font></tt>
<br><tt><font size=1>namei at 1.4.10. It made no discernible difference.</font></tt>
<br>
<br><tt><font size=1>Two volumes in particular accounted for &gt;90% of
our off-line </font></tt>
<br><tt><font size=1>volume issues.</font></tt>
<br>
<br><tt><font size=1>FileLog:<br>
Mon Apr 27 10:56:09 2009 Volume 2023867468 now offline, must be salvaged.<br>
Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.<br>
Mon Apr 27 10:56:15 2009 Volume 2023867468 now offline, must be salvaged.<br>
Mon Apr 27 10:56:22 2009 fssync: volume 2023867469 restored; breaking all
call backs </font></tt>
<br><tt><font size=1>(restored vol above being R/O for R/W in need of salvage)</font></tt>
<br>
<br><tt><font size=1>Both of the volumes most frequently impacted have
content </font></tt>
<br><tt><font size=1>completely rewritten roughly every 20 minutes while
being on </font></tt>
<br><tt><font size=1>an automated replication schedule of 15 minutes. One
of them </font></tt>
<br><tt><font size=1>25MB, the other 95MB, both at about 80% quota.</font></tt>
<br>
<br><tt><font size=1>We downgraded just the fileserver binary to 1.4.8
on all of </font></tt>
<br><tt><font size=1>our servers and have not seen a single off-line message
in </font></tt>
<br><tt><font size=1>36 hours.</font></tt>
<br>
<br><tt><font size=1><br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; --
David Boldt<br>
 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &lt;dboldt@usgs.gov&gt;<br>
</font></tt>
--=_alternative 0059A031852575B5_=--