[OpenAFS] AFS lag

Pesce, Nicholas npesce@qualcomm.com
Wed, 18 Mar 2009 09:30:43 -0700


We just experienced significant lag issues at our AFS site for vos exam and=
 vos release issues.  This seemed to be caused by a bug with Ubik callbacks=
 (version 1.4.7) .  One of our database servers was restarted then all of t=
he database servers did not sync properly with the sync-site (only the sync=
 site was working). I got all but one of the vlserver's to run.  But until =
I got all 6 servers functioning properly (after patching) we still saw this=
 issue.

Have you checked udebug to ensure that all of your database server processe=
s are current, up and giving a beacon?


I agree with Abdelkader and would recommend having at least 3 database serv=
ers.  You could be walking on very thin ice with just 2.

Sincerely,

--
Nicholas Pesce
npesce@qualcomm.com


-----Original Message-----
From: openafs-info-admin@openafs.org [mailto:openafs-info-admin@openafs.org=
] On Behalf Of Felix Frank
Sent: Wednesday, March 18, 2009 4:15 AM
To: Abdelkader El mastour
Cc: openafs-info@openafs.org
Subject: Re: [OpenAFS] AFS lag

On Wed, 18 Mar 2009, Abdelkader El mastour wrote:

> Configuration
> Netbsd4
> heimdal1.1
> arla

You have Arla clients?

> Openafs 1.4.5 via pkgsrc
> replicated root.afs & root.cell RO
> 1000 user per server
>
> 10 servers for fileserver.
>
> 2 servers for vlserver and ptserver

This is not good. I've recently run some tests with 2 DB-servers, and
operation is not optimal. It can take them longer than necessary to=20
determine the sync site. 3 servers is pretty much ideal, but even a single=
=20
server works smoother than 2 IMHO.

> Our users have been experiencing some major lag accessing afs .
> It all began when we had an hardware problem with one of our afs servers
> (afs-1),accessing afs was laggy for every user on the server
> so we decided to move every one of them from this server to one of the ni=
ne
> others,
> we shutdown the broken server take it off the listaddrs list and restart =
the
> vlserver instance.
> The slowdown continues..
>
> We turned on the afs-1 server again  but without lunch any afs services a=
nd
> then no more lags accesing afs.
> Since then we've had to shutdown afs-1 ,took it off the listaddrs ,and la=
gs
> are back.
> Note#1 : afs servers are up since a year and we've never exeperienced any
> issue before.
> Note#2 : bos status and sysstat doesnt reveal any issue .
> Any guess about the reasons for lags ?

I presume afs-1 was NOT one of your DB servers. If it is,=20
CellServDB would be the place to start.

There may be problems with replicated volumes. root.cell should be cached a=
t
all times (are there frequent vos release's?) but who knows...

On afflicted clients, try vos checkv.

HTH
Felix
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info