[OpenAFS] OpenAFS and Xen wierdnesses: regular loss of afs server connectivity

Chris Kurtz chk@mars.asu.edu
Sun, 01 Feb 2009 13:33:55 -0700


Both the guests and hosts are 64bit, and the guests are paravirtualized.
The hosts are running ntp from a central clocksource -- I haven't seen
any clockdrift.

We've seen the io problems, and made changes to eliminate that as much
as possible (everything but the bare OS is nfs mounted, and we use nfsv3
with tcp).

Nothing is wrong with the afs servers: the haven't crashed, and they
continue to serve other non-xen clients with no issues.

The afs servers were tuned using that guide and tons of suggestions made
at last year's AFS Best Practices Workshop.

I'll grab the server config when I get a chance.

...Chris
--
Chris Kurtz, chk@mars.asu.edu
Systems Manager
Mars Space Flight Facility
Arizona State University


Steven Jenkins wrote:
> On Fri, Jan 30, 2009 at 12:35 PM, Chris Kurtz <chk@mars.asu.edu> wrote:
>> Specs:
>>
> ...
>> Jan 30 10:28:48 www4 kernel: afs: Lost contact with volume location
>> server 149.169.146.57 in cell mars.asu.edu
>> Jan 30 10:30:03 www4 kernel: afs: volume location server 149.169.146.57
>> in cell mars.asu.edu is back up
>>
> 
> This isn't the fileserver at all (so the documentation I pointed you
> to is interesting, but not necessarily relevant), but is the vlserver.
>  Some of the same suggestions apply, though:
> 
> 1- are the processes crashing? (bos status, logs, etc)
> 2- are you having quorum issues? (udebug, logs, etc)
> 3- can you give more resources to the vlservers?  For example, are
> they running on their own VMs?
> 4- how is your cell configured?  (e.g., do you have 3 vlservers)
>