[OpenAFS] AFS lag
Felix Frank
Felix.Frank@Desy.de
Wed, 18 Mar 2009 12:15:11 +0100 (CET)
On Wed, 18 Mar 2009, Abdelkader El mastour wrote:
> Configuration
> Netbsd4
> heimdal1.1
> arla
You have Arla clients?
> Openafs 1.4.5 via pkgsrc
> replicated root.afs & root.cell RO
> 1000 user per server
>
> 10 servers for fileserver.
>
> 2 servers for vlserver and ptserver
This is not good. I've recently run some tests with 2 DB-servers, and
operation is not optimal. It can take them longer than necessary to
determine the sync site. 3 servers is pretty much ideal, but even a single
server works smoother than 2 IMHO.
> Our users have been experiencing some major lag accessing afs .
> It all began when we had an hardware problem with one of our afs servers
> (afs-1),accessing afs was laggy for every user on the server
> so we decided to move every one of them from this server to one of the nine
> others,
> we shutdown the broken server take it off the listaddrs list and restart the
> vlserver instance.
> The slowdown continues..
>
> We turned on the afs-1 server again but without lunch any afs services and
> then no more lags accesing afs.
> Since then we've had to shutdown afs-1 ,took it off the listaddrs ,and lags
> are back.
> Note#1 : afs servers are up since a year and we've never exeperienced any
> issue before.
> Note#2 : bos status and sysstat doesnt reveal any issue .
> Any guess about the reasons for lags ?
I presume afs-1 was NOT one of your DB servers. If it is,
CellServDB would be the place to start.
There may be problems with replicated volumes. root.cell should be cached at
all times (are there frequent vos release's?) but who knows...
On afflicted clients, try vos checkv.
HTH
Felix