[OpenAFS] AFS lag
Abdelkader El mastour
a.elmastour@gmail.com
Wed, 18 Mar 2009 19:28:49 +0100
--001636c5bb71dec054046568da19
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
On Wed, Mar 18, 2009 at 5:30 PM, Pesce, Nicholas <npesce@qualcomm.com>wrote:
> We just experienced significant lag issues at our AFS site for vos exam and
> vos release issues. This seemed to be caused by a bug with Ubik callbacks
> (version 1.4.7) . One of our database servers was restarted then all of the
> database servers did not sync properly with the sync-site (only the sync
> site was working). I got all but one of the vlserver's to run. But until I
> got all 6 servers functioning properly (after patching) we still saw this
> issue.
>
> Have you checked udebug to ensure that all of your database server
> processes are current, up and giving a beacon?
>
>
> I agree with Abdelkader and would recommend having at least 3 database
> servers. You could be walking on very thin ice with just 2.
>
> Sincerely,
>
> --
> Nicholas Pesce
> npesce@qualcomm.com
>
>
> -----Original Message-----
> From: openafs-info-admin@openafs.org [mailto:
> openafs-info-admin@openafs.org] On Behalf Of Felix Frank
> Sent: Wednesday, March 18, 2009 4:15 AM
> To: Abdelkader El mastour
> Cc: openafs-info@openafs.org
> Subject: Re: [OpenAFS] AFS lag
>
> On Wed, 18 Mar 2009, Abdelkader El mastour wrote:
>
> > Configuration
> > Netbsd4
> > heimdal1.1
> > arla
>
> You have Arla clients?
>
> > Openafs 1.4.5 via pkgsrc
> > replicated root.afs & root.cell RO
> > 1000 user per server
> >
> > 10 servers for fileserver.
> >
> > 2 servers for vlserver and ptserver
>
> This is not good. I've recently run some tests with 2 DB-servers, and
> operation is not optimal. It can take them longer than necessary to
> determine the sync site. 3 servers is pretty much ideal, but even a single
> server works smoother than 2 IMHO.
>
> > Our users have been experiencing some major lag accessing afs .
> > It all began when we had an hardware problem with one of our afs servers
> > (afs-1),accessing afs was laggy for every user on the server
> > so we decided to move every one of them from this server to one of the
> nine
> > others,
> > we shutdown the broken server take it off the listaddrs list and restart
> the
> > vlserver instance.
> > The slowdown continues..
> >
> > We turned on the afs-1 server again but without lunch any afs services
> and
> > then no more lags accesing afs.
> > Since then we've had to shutdown afs-1 ,took it off the listaddrs ,and
> lags
> > are back.
> > Note#1 : afs servers are up since a year and we've never exeperienced any
> > issue before.
> > Note#2 : bos status and sysstat doesnt reveal any issue .
> > Any guess about the reasons for lags ?
>
> I presume afs-1 was NOT one of your DB servers. If it is,
> CellServDB would be the place to start.
>
> There may be problems with replicated volumes. root.cell should be cached
> at
> all times (are there frequent vos release's?) but who knows...
>
> On afflicted clients, try vos checkv.
>
> HTH
> Felix
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
>I agree with Abdelkader and would recommend having at least 3 database
servers. You could be walking on very thin ice with just 2.
Whats the reason for this ?
--
Abdelkader El mastour
0620477723
--001636c5bb71dec054046568da19
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
<br><br><div class=3D"gmail_quote">On Wed, Mar 18, 2009 at 5:30 PM, Pesce, =
Nicholas <span dir=3D"ltr"><<a href=3D"mailto:npesce@qualcomm.com">npesc=
e@qualcomm.com</a>></span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8e=
x; padding-left: 1ex;">
We just experienced significant lag issues at our AFS site for vos exam and=
vos release issues. =A0This seemed to be caused by a bug with Ubik callbac=
ks (version 1.4.7) . =A0One of our database servers was restarted then all =
of the database servers did not sync properly with the sync-site (only the =
sync site was working). I got all but one of the vlserver's to run. =A0=
But until I got all 6 servers functioning properly (after patching) we stil=
l saw this issue.<br>
<br>
Have you checked udebug to ensure that all of your database server processe=
s are current, up and giving a beacon?<br>
<br>
<br>
I agree with Abdelkader and would recommend having at least 3 database serv=
ers. =A0You could be walking on very thin ice with just 2.<br>
<br>
Sincerely,<br>
<br>
--<br>
Nicholas Pesce<br>
<a href=3D"mailto:npesce@qualcomm.com">npesce@qualcomm.com</a><br>
<div><div></div><div class=3D"h5"><br>
<br>
-----Original Message-----<br>
From: <a href=3D"mailto:openafs-info-admin@openafs.org">openafs-info-admin@=
openafs.org</a> [mailto:<a href=3D"mailto:openafs-info-admin@openafs.org">o=
penafs-info-admin@openafs.org</a>] On Behalf Of Felix Frank<br>
Sent: Wednesday, March 18, 2009 4:15 AM<br>
To: Abdelkader El mastour<br>
Cc: <a href=3D"mailto:openafs-info@openafs.org">openafs-info@openafs.org</a=
><br>
Subject: Re: [OpenAFS] AFS lag<br>
<br>
On Wed, 18 Mar 2009, Abdelkader El mastour wrote:<br>
<br>
> Configuration<br>
> Netbsd4<br>
> heimdal1.1<br>
> arla<br>
<br>
You have Arla clients?<br>
<br>
> Openafs 1.4.5 via pkgsrc<br>
> replicated root.afs & root.cell RO<br>
> 1000 user per server<br>
><br>
> 10 servers for fileserver.<br>
><br>
> 2 servers for vlserver and ptserver<br>
<br>
This is not good. I've recently run some tests with 2 DB-servers, and<b=
r>
operation is not optimal. It can take them longer than necessary to<br>
determine the sync site. 3 servers is pretty much ideal, but even a single<=
br>
server works smoother than 2 IMHO.<br>
<br>
> Our users have been experiencing some major lag accessing afs .<br>
> It all began when we had an hardware problem with one of our afs serve=
rs<br>
> (afs-1),accessing afs was laggy for every user on the server<br>
> so we decided to move every one of them from this server to one of the=
nine<br>
> others,<br>
> we shutdown the broken server take it off the listaddrs list and resta=
rt the<br>
> vlserver instance.<br>
> The slowdown continues..<br>
><br>
> We turned on the afs-1 server again =A0but without lunch any afs servi=
ces and<br>
> then no more lags accesing afs.<br>
> Since then we've had to shutdown afs-1 ,took it off the listaddrs =
,and lags<br>
> are back.<br>
> Note#1 : afs servers are up since a year and we've never exeperien=
ced any<br>
> issue before.<br>
> Note#2 : bos status and sysstat doesnt reveal any issue .<br>
> Any guess about the reasons for lags ?<br>
<br>
I presume afs-1 was NOT one of your DB servers. If it is,<br>
CellServDB would be the place to start.<br>
<br>
There may be problems with replicated volumes. root.cell should be cached a=
t<br>
all times (are there frequent vos release's?) but who knows...<br>
<br>
On afflicted clients, try vos checkv.<br>
<br>
HTH<br>
Felix<br>
</div></div>_______________________________________________<br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org">OpenAFS-info@openafs.org</a><br=
>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</blockquote></div><br><br>
<br>
>I agree with Abdelkader and would recommend having at least 3 database
servers. =A0You could be walking on very thin ice with just 2.<br>
Whats the reason for this ?<br clear=3D"all"><br>-- <br>Abdelkader El masto=
ur<br>0620477723<br>
--001636c5bb71dec054046568da19--