[OpenAFS] AFS namei file servers, SAN, any issues elsewhere? We've had some.
Can AFS _cause_ SAN issues?
Kim Kimball
dhk@ccre.com
Thu, 13 Mar 2008 13:45:15 -0600
We're using Hitachi USP and Hitachi 9585 SAN devices, and have had a
series of incidents that, after two years of success, significantly
affected AFS reliability for a period of six months.
I'm wondering if anyone else has had any issues using SANs for vice
partitions.
Also, to make a long story short, I've been asked by my management to
determine if AFS itself can cause SANs to misbehave.
I can't see how, but committed to getting additional opinions.
Please opine!
Any experience, good or bad, with AFS impact from using SANs for namei
vicep's is very helpful.
Any theories about how AFS could confuse a SAN also very helpful.
Thanks.
Kim
(Below I've added some detail about the SAN/AFS interaction I've seen,
for those who are interested.)
========================================
For the record, here's what I've been experiencing. The worst of the
experience, as detailed below, was the impact on creation of move and
release clones but not backup clones
AFS IMPACT
We were running 1.4.1 with some patches. (Upgrading to 1.4.6 has been
part of a thus far definitive fix for the 9585 issues.)
The worst of the six month stretch occured when the primary and
secondary controller roles (9585 only thus far) were reversed as a
consequence of SAN fabric rebuilds. For whatever reason, the time
required to create volume clones for AFS 'vos release' and 'vos move'
(using 'vos status' to audit clone time) increased from a typical
several seconds to minutes, ten minutes, and in one case four hours.
The RW volume is of course unwritable during the clone operation.
'vos remove' times on afflicted partitions were also affected, with
increased time required to remove a volume.
I don't know why the creation of .backup clones was not similarly
affected. For a given volume the create time/refresh time for a move
clone or release clone might have been fifteen minutes, while the
.backup clone created quickly and took only slightly longer than usual.
With 'vos move' out of the picture I moved volumes with dump/restore,
for volumes not frequently or recently updated, and dump/restore
followed by use of a synchronization tool, Unison, to create a new RW
volume, followed by changing the mount point to point to the name of the
new volume, followed by waiting until the previous RW volume no longer
showed any updates for a few days.
(If anyone is interested in Unison let me know. I'm thinking of talking
about it at Best Practices this year.)
The USP continues to spew SCSI command timeouts.
I tried dump|restore -overwrite -- which turned up interesting
behavior. The restore didn't update the VLDB entry until after the
remove of the 'overwritten' volume. Since deleteVolume was taking a
long long time on affected vice partitions I stopped using dump|restore
-overwrite on frequently changed volumes and used
dump|restore-to-newname|change mount points instead.
(This behavior of 'vos restore' may not be true of 1.4.6, as I suspect
the behavior may have been related to the single threading of the
volserver which was fixed in 1.4.6)
I had always thought that the code to create clones of volumes was
shared, and don't have a good reason for the .backup creation differing
from the move and release clone creation. I haven't gone to look to see
if .backup code is separate. Could it might have simply been that the
creation of a .backup volume is likely to be an incremental update of an
existing clone, while a move clone and release clone are more than
likely full clone operations?
SAN symptoms, for those interested
I'm seeing SCSI command timeouts and UFS log timeouts (on vice
partitions using the SAN for storage) on LUNS used for vicep's on the
Hitachi USP, and was seeing them also on the 9585 until a recent
configuration change.
At first I thought this was load related, so wrote scripts to generate a
goodly load. It turns out that even with a one second sleep between
file create/write/close operations and between rm operations the SCSI
command timeouts still occur, and that it's not load but simply activity
that turns up the timeouts.
AFS is an excellent diagnostic for storage and network burps, and we've
unsurprisingly seen more of the SCSI command timeouts and UFS log
timeouts (Solaris) on the AFS file servers than anywhere else, but have
seen some occurrences elsewhere.
The impact of the Solaris UFS log timeout is confined to the vicep which
is, in response to the log timeout, unmounted by Solaris. It must be
fsck'd and remounted. Not great with several hundred GB out of service
for the duration of the fsck. One UFS log timeout resulted in the loss
of ~ 200GB of data. (More accurately, fsck ran for more than five days,
I'd already restored the data from tape, and chose to 'lose' the data
after fsck completed since I couldn't figure out what the heck fsck had
been doing for five days and didn't trust the results. Not to mention
five days of updates to the restored volumes, and no requirement to
merge the recovered with the restored.)
The HBAs on the 9585 were apparently configured as active/passive and
not active/active (or obverse) and I've not seen SCSI command timeouts
on any of the 9585 LUNs since the configuration was changed.
IN CLOSING
I realize this isn't a SAN forum and my inquiry isn't about SANs, and
provide the information above just to share my experiences with SAN over
the past six months. We ran successfully for two years prior to the
onset of these issues, and if anyone wants to discuss SAN issues off
line my email address is below. I can tell you what we saw and what
we've done to correct issues, but am not a SAN expert by any means.
TIA
Kim Kimball
dhk at ccre period com