[OpenAFS] OpenAFS benchmark improvements
Wed, 12 Dec 2007 17:42:48 -0500
I'm going to second a big chunk of what Jerry wrote. About five years
ago I inherited an AFS cell that had been through some rough time and
spent more than a little time cleaning. The end result was much
faster service. Our performance was never as bad as Jerry's, but it
was still nothing to write home about.
I did some of the same things, didn't have to do others. Of the
things done, two surprised me in that they made a difference. One
was the same as Jerry's - getting rid of all the bogus values
returned by vos listaddr. It didn't seem to make much difference to
the users, but by god, anything that tried to look at all the servers
got orders of magnitude better.
The other was to salvage every single volume in the cell, attaching
bos salvage <volume> -orphans attach
Sonofagun if my salvages on reboot didn't stop peppering me with
complaints. We deleted all the dead files it found, thus reclaiming
some of the 'missing' disk space and gaving it back to the users. As
a side effect, now if I get a salvage message, I know it's something
to look at. On the other hand, note that if you restore a volume from
before you forced the attach, it will need a salvage.
Another thing which helped (and unlike the other two, I expected this
to help) was to get the vldb and the on-server volumes back into
sync. I don't recall the precise steps I had to go through, but it
took more than just doing a set of vos syncvldb/vos syncserv commands
on the various machines involved. I do recall generating a vldb list
and comparing that to the output of vos listvol from all the servers,
and that had to be followed by some vos zap and vos remove and vos
remsite commands. The whole cell got a lot snappier after that.
Once we were confident the cell was stable, we upgraded from Transarc
to OAFS. That
made a big difference too, but it wasn't done until after the cleanup.
On Dec 12, 2007, at 9:52 AM, Jerry Normandin wrote:
> So.. any of you out there that are experiencing AFS slowness, do a
> sanity check to see what you come up with. You might just say, WTF!