[OpenAFS] Performance issue with "many" volumes in a single /vicep?

Steve Simmons scs@umich.edu
Thu, 25 Mar 2010 12:05:03 -0400

On Mar 24, 2010, at 11:43 PM, Tom Keiser wrote:

>> Our estimate too. But before drilling down, it seemed worth checking =
if anyone else has a similar server - ext3 with 14,000 or more volumes =
in a single vice partition - and has seen a difference. Note, tho, that =
it's not #inodes or total disk usage in the partition. The servers that =
exhibited the problem had a large number of mostly empty volumes.
> Sure.  Makes sense.   The one thing that does come to mind is that
> regardless of the number of inodes, ISTR some people were having
> trouble with ext performance when htree indices were turned on because
> spatial locality of reference against the inode tables goes way down
> when you process files in the order returned by readdir(), since
> readdir() in htree mode returns files in hash chain order rather than
> more-or-less inode order.  This could definitely have a huge impact on
> the salvager [especially GetVolumeSummary(), and to a lesser extent
> ListViceInodes() and friends].  I'm less certain how it would affect
> things in the volserver, but it would certainly have an effect on
> operations which delete clones, since the nuke code also calls
> ListViceInodes().
> In addition, with regard to ext htree indices I'll pose the
> (completely untested) hypothesis that htree indices aren't necessarily
> a net win for the namei workload.  Given that namei goes great lengths
> to avoid large directories (with the notable exception of the /vicepXX
> root dir itself), it is conceivable that htree overhead is actually a
> net loss.  I don't know for sure, but I'd say it's worth doing further
> study.  In a volume with files>>dirs you're going to see on the order
> of ~256 files per namei directory.  Certainly a linear search of on
> average 128 entries is expensive, but it may be worth verifying this
> empirically because we don't know how much overhead htree and its
> side-effects produce.  Regrettably, there don't seem to be any
> published results on the threshold above which htree becomes a net
> win...
> Finally, you did tune2fs -O dir_index <dev> before populating the file
> system, right?

Didn't try that one, no.

Good suggestions, Tom. We're working on duplicating this in one of our =
test cells; assuming we can we'll try out these and see what actually =