[OpenAFS] Performance issue with "many" volumes in a single /vicep?

Tom Keiser tkeiser@sinenomine.net
Thu, 18 Mar 2010 02:37:07 -0400


On Wed, Mar 17, 2010 at 7:41 PM, Derrick Brashear <shadow@gmail.com> wrote:
> On Wed, Mar 17, 2010 at 12:50 PM, Steve Simmons <scs@umich.edu> wrote:
>> We've been seeing issues for a while that seem to relate to the number o=
f volumes in a single vice partition. The numbers and data are inexact beca=
use there are so many damned possible parameters that affect performance, b=
ut it appears that somewhere between 10,000 and 14,000 volumes performance =
falls off significantly. That 40% difference in volume count results in 2x =
to 3x falloffs for performance in issues that affect the /vicep as a whole =
- backupsys, nightly dumps, vos listvol, etc.
>>

First off, could you describe how you're measuring the performance drop-off=
?

The fact that this relationship b/t volumes and performance is
superlinear makes me think you're exceeding a magic boundary (e.g
you're now causing eviction pressure on some cache where you weren't
previously...).

Another possibility accounting for the superlinearity, which would
very much depend upon your workload, is that by virtue of increased
volume count you're now experiencing higher volume operation
concurrency, thus causing higher rates of partition lock contention.
However, this would be very specific to the volume server and
salvager--it should not have any substantial effect on the file
server, aside from some increased VOL_LOCK contention...


>> My initial inclination is to say it's a linux issue with directory searc=
hes, but before pursuing this much further I'd be interested in hearing fro=
m anyone who's running 14,000 or move volumes in a single vicep. No, I'm no=
t counting .backup volumes in there, so 14,000 volumes means 28,000 entries=
 in the directory.
>
> Another possibility: there's a hash table which is taking the bulk of
> that that you then search linearly.

Hmm.  That does sound plausible.  Although, it seems like that
generally shouldn't result in superlinear performance changes
(ignoring interaction effects between the data structure and the
memory hierarchy); it would almost have to imply that the additional
4,000 volumes have special properties with respect to the hash
function.

-Tom