[OpenAFS] openafs fileservers in VMware ESX

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 21 Apr 2005 13:03:36 -0400


On Thursday, April 21, 2005 02:35:01 PM +1200 Matthew Cocker 
<matt@cs.auckland.ac.nz> wrote:

> Hi
>
> We have just invested in a Fibre Channel SANs and several FC attached ESX
> servers (brillant product, just love vmotion and virtual center) and are
> playing with Virtualised Openafs Fileservers. All is working very well
> except if we put to many volumes on a server at which point "vos listvol"
> takes a very long time to return.
>
> If we have say 5000-7000 volumes (about 50Gb) on a vice partition
> performance is equivalent to hardware server. At 10k volumes to 40k
> volumes 100-300Gb we have problems with vos listvol.
>
> This is not a huge problem for us as we wanted to do more smaller
> machines any way to take advantage of the VM environment but it does make
> me wonder why this occurs.
>
> What exactly does vos listvol do? does it scan the vice partitions and
> return all the volumes it finds (du -sh /vicepa takes a huge amount of
> time too so maybe this is a vm issue)? Is any network traffic exchanged
> with the DBs?
>
> When we start vos listvol on the virtualised server with lots of volumes
> it just seems to stop working with the cpu usage for the afs process not
> jumping above 1-2%. An strace (available if anyone interested) shows the
> vos listvol is doing something (although very slowly).
>
> If the virtualised server has less volumes cpu usage jumps up to 30-50%
> and every thing works.
>
> The only thing effected seems to be vos listvol as accessing a volume
> stored on the server is quick (from user point of view). vos backup stuff
> all seems to work.
>
> Hardware server with same number of volumes works OK.
>
> SANS monitoring suggests there is not a data access issue on that side.
>
> Not sure this is an AFS issue but any suggestion to help me understand
> why vos listvol is effected so badly apprepriated.



You haven't told us what kernel version or architecture are involved, or 
what OpenAFS versions your servers or vos client are.  That makes it hard 
to tell which known-and-fixed bugs you might be running into.

Note that 'vos' doesn't do anything other than talk to servers.  So if you 
run strace on vos, what you're going to see is that it just sits there 
waiting for a response from the server it's talking to.

The RPC that 'vos listvol' makes returns an array of volIntInfo structures, 
one per volume.  Each structure is about 115 bytes on the wire (slightly 
more in memory), and they _all_ need to be allocated and filled in before 
the server will start returning any data.  For 40K volumes, that's about 
5MB of memory, allocated 128 bytes at a time.  That's not too excessive, 
but it's worth noting that that much data will take some time to allocate, 
marshall, transfer, and unmarshall.

Perhaps more importantly, that same RPC needs to attach each volume in 
order to read its header, and depending on the version of OpenAFS you're 
running, that operation involves a buffer sync on every attach.  Which 
means that running listvol against a partition with 40000 volumes is 
equivalent to running 'sync' 40000 times.  Running a new enough version 
will fix that, but at the expense of details reported by 'vos examine' and 
'vos listvol -long' being out of date by as much as 25 minutes.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA