[OpenAFS] Performance issue with "many" volumes in a single /vicep?

Steve Simmons scs@umich.edu
Wed, 24 Mar 2010 16:32:20 -0400


On Mar 18, 2010, at 2:37 AM, Tom Keiser wrote:

> On Wed, Mar 17, 2010 at 7:41 PM, Derrick Brashear <shadow@gmail.com> =
wrote:
>> On Wed, Mar 17, 2010 at 12:50 PM, Steve Simmons <scs@umich.edu> =
wrote:
>>> We've been seeing issues for a while that seem to relate to the =
number of volumes in a single vice partition. The numbers and data are =
inexact because there are so many damned possible parameters that affect =
performance, but it appears that somewhere between 10,000 and 14,000 =
volumes performance falls off significantly. That 40% difference in =
volume count results in 2x to 3x falloffs for performance in issues that =
affect the /vicep as a whole - backupsys, nightly dumps, vos listvol, =
etc.
>>>=20
>=20
> First off, could you describe how you're measuring the performance =
drop-off?

Wall clock, mostly. Operations which touch all the volumes on a server =
take disproportionately longer on servers w/10,000 volumes vs servers =
with 14,000. The best operations to show this are vos backupsys and our =
nightly dumps, which call vos dump with various parameters on every =
volume on the server.

> The fact that this relationship b/t volumes and performance is
> superlinear makes me think you're exceeding a magic boundary (e.g
> you're now causing eviction pressure on some cache where you weren't
> previously...).

Our estimate too. But before drilling down, it seemed worth checking if =
anyone else has a similar server - ext3 with 14,000 or more volumes in a =
single vice partition - and has seen a difference. Note, tho, that it's =
not #inodes or total disk usage in the partition. The servers that =
exhibited the problem had a large number of mostly empty volumes.

> Another possibility accounting for the superlinearity, which would
> very much depend upon your workload, is that by virtue of increased
> volume count you're now experiencing higher volume operation
> concurrency, thus causing higher rates of partition lock contention.
> However, this would be very specific to the volume server and
> salvager--it should not have any substantial effect on the file
> server, aside from some increased VOL_LOCK contention...

Salvager is not involved, or at least, hasn't yet been involved. It's =
vos backupsys and vos dump where we see it mostly.

Steve=