[OpenAFS] openafs fileservers in VMware ESX

Matthew Cocker matt@cs.auckland.ac.nz
Thu, 21 Apr 2005 14:35:01 +1200


Hi

We have just invested in a Fibre Channel SANs and several FC attached 
ESX servers (brillant product, just love vmotion and virtual center) and 
are playing with Virtualised Openafs Fileservers. All is working very 
well except if we put to many volumes on a server at which point "vos 
listvol" takes a very long time to return.

If we have say 5000-7000 volumes (about 50Gb) on a vice partition 
performance is equivalent to hardware server. At 10k volumes to 40k 
volumes 100-300Gb we have problems with vos listvol.

This is not a huge problem for us as we wanted to do more smaller 
machines any way to take advantage of the VM environment but it does 
make me wonder why this occurs.

What exactly does vos listvol do? does it scan the vice partitions and 
return all the volumes it finds (du -sh /vicepa takes a huge amount of 
time too so maybe this is a vm issue)? Is any network traffic exchanged 
with the DBs?

When we start vos listvol on the virtualised server with lots of volumes 
it just seems to stop working with the cpu usage for the afs process not 
jumping above 1-2%. An strace (available if anyone interested) shows the 
vos listvol is doing something (although very slowly).

If the virtualised server has less volumes cpu usage jumps up to 30-50% 
and every thing works.

The only thing effected seems to be vos listvol as accessing a volume 
stored on the server is quick (from user point of view). vos backup 
stuff all seems to work.

Hardware server with same number of volumes works OK.

SANS monitoring suggests there is not a data access issue on that side.

Not sure this is an AFS issue but any suggestion to help me understand 
why vos listvol is effected so badly apprepriated.

Cheers

Matt