[OpenAFS] AFS lag

Felix Frank Felix.Frank@Desy.de
Wed, 18 Mar 2009 12:15:11 +0100 (CET)

On Wed, 18 Mar 2009, Abdelkader El mastour wrote:

> Configuration
> Netbsd4
> heimdal1.1
> arla

You have Arla clients?

> Openafs 1.4.5 via pkgsrc
> replicated root.afs & root.cell RO
> 1000 user per server
> 10 servers for fileserver.
> 2 servers for vlserver and ptserver

This is not good. I've recently run some tests with 2 DB-servers, and
operation is not optimal. It can take them longer than necessary to 
determine the sync site. 3 servers is pretty much ideal, but even a single 
server works smoother than 2 IMHO.

> Our users have been experiencing some major lag accessing afs .
> It all began when we had an hardware problem with one of our afs servers
> (afs-1),accessing afs was laggy for every user on the server
> so we decided to move every one of them from this server to one of the nine
> others,
> we shutdown the broken server take it off the listaddrs list and restart the
> vlserver instance.
> The slowdown continues..
> We turned on the afs-1 server again  but without lunch any afs services and
> then no more lags accesing afs.
> Since then we've had to shutdown afs-1 ,took it off the listaddrs ,and lags
> are back.
> Note#1 : afs servers are up since a year and we've never exeperienced any
> issue before.
> Note#2 : bos status and sysstat doesnt reveal any issue .
> Any guess about the reasons for lags ?

I presume afs-1 was NOT one of your DB servers. If it is, 
CellServDB would be the place to start.

There may be problems with replicated volumes. root.cell should be cached at
all times (are there frequent vos release's?) but who knows...

On afflicted clients, try vos checkv.