[OpenAFS] AFS namei file servers, SAN, any issues elsewhere? We've had some. Can AFS _cause_ SAN issues?

Kim Kimball dhk@ccre.com
Mon, 17 Mar 2008 15:03:08 -0600


Hi Jeff,


Jeffrey Altman wrote:
>
> AFS is a very stressful application for a file system.  If there are
> bugs in the SAN AFS would be more likely to find them than other
> applications.
>
*grin*  Try telling that to my management!  I just sent an email calling 
AFS an excellent network and storage diagnostic.  We used to track 
reports of AFS issues versus resolved issues that turned out to be 
infrastructure related.  I forget the exact figures, but by far AFS was 
not implicated, only complaining.  The effort to get away from AFS is 
renewed!

>> ========================================
>> For the record, here's what I've been experiencing.  The worst of the 
>> experience, as detailed below, was the impact on creation of move and 
>> release clones but not backup clones
>>
>> AFS IMPACT
>>
>> We were running 1.4.1 with some patches.  (Upgrading to 1.4.6 has 
>> been part of a thus far definitive fix for the 9585 issues.)
>
> The primary difference between 1.4.1 and 1.4.6 is the bundling of
> FSync calls which would significantly reduce the load on the
> underlying file system.  (Robert Banz gave a good description of
> the impact.)  If this change is permitting the SAN to perform its
> operations with a reduced incident rate, that would imply that
> there is still a problem in the SAN (or the connections between the
> host machine and the SAN) but it is not being tickled (as often.)
>
Agreed.
>> The worst of the six month stretch occured when the primary and 
>> secondary controller roles (9585 only thus far) were reversed as a 
>> consequence of SAN fabric rebuilds.  For whatever reason, the time 
>> required to create volume clones for AFS 'vos release' and 'vos move' 
>> (using 'vos status' to audit clone time) increased from a typical 
>> several seconds to minutes, ten minutes, and in one case four hours.  
>> The RW volume is of course unwritable during the clone operation.
>
> My conclusion:
> The secondary controller, the cabling, or something else along
> that data path is defective.
>
Thanks for the confirmation.

That's the growing conclusion of various vendors as well.  We appear to 
be replacing the SAN fabric piece by piece, sucked along in the 
slipstream of "maybe this will work."  Which is fine, but it's taken a 
month thus far, and I'm refusing to use the SAN until the timeout errors 
stop.

>> 'vos remove' times on afflicted partitions were also affected, with 
>> increased time required to remove a volume.
>>
>> I don't know why the creation of .backup clones was not similarly 
>> affected.  For a given volume the create time/refresh time for a move 
>> clone or release clone might have been fifteen minutes, while the 
>> .backup clone created quickly and took only slightly longer than usual.
>
> The data is not copied for a .backup until the data actually changes.
>
So I should have seen the same cloning behavior if I'd used 'backup 
-force" (or whatever it is) or removed the .backup and then run vos 
backup.  I'll check my notes but don't recall documenting this.  Pretty 
sure I tried, pretty sure I saw what you'd expect.

Is the code base for cloning then shared, as I speculated?   (If you 
know offhand.  I believe it is but haven't checked.)

>> With 'vos move' out of the picture I moved volumes with dump/restore, 
>> for volumes not frequently or recently updated, and dump/restore 
>> followed by use of a synchronization tool, Unison, to create a new RW 
>> volume, followed by changing the mount point to point to the name of 
>> the new volume, followed by waiting until the previous RW volume no 
>> longer showed any updates for a few days.
>>
>> (If anyone is interested in Unison let me know.  I'm thinking of 
>> talking about it at Best Practices this year.)
>
> The deadline for submissions is approaching fast.  Please submit your
> talk.
>
Blzorp!  Thanks.  Completely forgot.
>> The USP continues to spew SCSI command timeouts.
>
> Bad controller?  Bad cable?  Bad disk?
>
> SCSI command timeouts are at a level far below AFS.  If an AFS service
> requests a disk operation and that operation results in SCSI command
> timeouts, there is something seriously wrong somewhere between the
> SCSI controller and the disk.
>
> No wonder you are getting lousy performance.
>
No kidding.  It's been miserable trying to support AFS with unstable 
storage.

>> I'm seeing SCSI command timeouts and UFS log timeouts (on vice 
>> partitions using the SAN for storage) on LUNS used for vicep's on the 
>> Hitachi USP, and was seeing them also on the 9585 until a recent 
>> configuration change.
>
> UFS log timeouts are more evidence that the problem is somewhere
> between UFS and the disk.
>
>> At first I thought this was load related, so wrote scripts to 
>> generate a goodly load.  It turns out that even with a one second 
>> sleep between file create/write/close operations and between rm 
>> operations the SCSI command timeouts still occur, and that it's not 
>> load but simply activity that turns up the timeouts.
>
> And I bet the SAN admins are telling you that there is nothing wrong.
> They are badly mistaken.
>
LOL!  They're telling me that they're only seeing this issues on AFS 
file servers.  (Except for a 'few instances' elsewhere, so "AFS 
obviously has a problem.")

Thanks Jeff.

More fuel for looking at the SAN.

Kim

>
> Jeffrey Altman
>