[OpenAFS] AFS lag

Abdelkader El mastour a.elmastour@gmail.com
Wed, 18 Mar 2009 19:28:49 +0100


--001636c5bb71dec054046568da19
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

On Wed, Mar 18, 2009 at 5:30 PM, Pesce, Nicholas <npesce@qualcomm.com>wrote:

> We just experienced significant lag issues at our AFS site for vos exam and
> vos release issues.  This seemed to be caused by a bug with Ubik callbacks
> (version 1.4.7) .  One of our database servers was restarted then all of the
> database servers did not sync properly with the sync-site (only the sync
> site was working). I got all but one of the vlserver's to run.  But until I
> got all 6 servers functioning properly (after patching) we still saw this
> issue.
>
> Have you checked udebug to ensure that all of your database server
> processes are current, up and giving a beacon?
>
>
> I agree with Abdelkader and would recommend having at least 3 database
> servers.  You could be walking on very thin ice with just 2.
>
> Sincerely,
>
> --
> Nicholas Pesce
> npesce@qualcomm.com
>
>
> -----Original Message-----
> From: openafs-info-admin@openafs.org [mailto:
> openafs-info-admin@openafs.org] On Behalf Of Felix Frank
> Sent: Wednesday, March 18, 2009 4:15 AM
> To: Abdelkader El mastour
> Cc: openafs-info@openafs.org
> Subject: Re: [OpenAFS] AFS lag
>
> On Wed, 18 Mar 2009, Abdelkader El mastour wrote:
>
> > Configuration
> > Netbsd4
> > heimdal1.1
> > arla
>
> You have Arla clients?
>
> > Openafs 1.4.5 via pkgsrc
> > replicated root.afs & root.cell RO
> > 1000 user per server
> >
> > 10 servers for fileserver.
> >
> > 2 servers for vlserver and ptserver
>
> This is not good. I've recently run some tests with 2 DB-servers, and
> operation is not optimal. It can take them longer than necessary to
> determine the sync site. 3 servers is pretty much ideal, but even a single
> server works smoother than 2 IMHO.
>
> > Our users have been experiencing some major lag accessing afs .
> > It all began when we had an hardware problem with one of our afs servers
> > (afs-1),accessing afs was laggy for every user on the server
> > so we decided to move every one of them from this server to one of the
> nine
> > others,
> > we shutdown the broken server take it off the listaddrs list and restart
> the
> > vlserver instance.
> > The slowdown continues..
> >
> > We turned on the afs-1 server again  but without lunch any afs services
> and
> > then no more lags accesing afs.
> > Since then we've had to shutdown afs-1 ,took it off the listaddrs ,and
> lags
> > are back.
> > Note#1 : afs servers are up since a year and we've never exeperienced any
> > issue before.
> > Note#2 : bos status and sysstat doesnt reveal any issue .
> > Any guess about the reasons for lags ?
>
> I presume afs-1 was NOT one of your DB servers. If it is,
> CellServDB would be the place to start.
>
> There may be problems with replicated volumes. root.cell should be cached
> at
> all times (are there frequent vos release's?) but who knows...
>
> On afflicted clients, try vos checkv.
>
> HTH
> Felix
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>



>I agree with Abdelkader and would recommend having at least 3 database
servers.  You could be walking on very thin ice with just 2.
Whats the reason for this ?

-- 
Abdelkader El mastour
0620477723

--001636c5bb71dec054046568da19
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Wed, Mar 18, 2009 at 5:30 PM, Pesce, =
Nicholas <span dir=3D"ltr">&lt;<a href=3D"mailto:npesce@qualcomm.com">npesc=
e@qualcomm.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" s=
tyle=3D"border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8e=
x; padding-left: 1ex;">
We just experienced significant lag issues at our AFS site for vos exam and=
 vos release issues. =A0This seemed to be caused by a bug with Ubik callbac=
ks (version 1.4.7) . =A0One of our database servers was restarted then all =
of the database servers did not sync properly with the sync-site (only the =
sync site was working). I got all but one of the vlserver&#39;s to run. =A0=
But until I got all 6 servers functioning properly (after patching) we stil=
l saw this issue.<br>

<br>
Have you checked udebug to ensure that all of your database server processe=
s are current, up and giving a beacon?<br>
<br>
<br>
I agree with Abdelkader and would recommend having at least 3 database serv=
ers. =A0You could be walking on very thin ice with just 2.<br>
<br>
Sincerely,<br>
<br>
--<br>
Nicholas Pesce<br>
<a href=3D"mailto:npesce@qualcomm.com">npesce@qualcomm.com</a><br>
<div><div></div><div class=3D"h5"><br>
<br>
-----Original Message-----<br>
From: <a href=3D"mailto:openafs-info-admin@openafs.org">openafs-info-admin@=
openafs.org</a> [mailto:<a href=3D"mailto:openafs-info-admin@openafs.org">o=
penafs-info-admin@openafs.org</a>] On Behalf Of Felix Frank<br>
Sent: Wednesday, March 18, 2009 4:15 AM<br>
To: Abdelkader El mastour<br>
Cc: <a href=3D"mailto:openafs-info@openafs.org">openafs-info@openafs.org</a=
><br>
Subject: Re: [OpenAFS] AFS lag<br>
<br>
On Wed, 18 Mar 2009, Abdelkader El mastour wrote:<br>
<br>
&gt; Configuration<br>
&gt; Netbsd4<br>
&gt; heimdal1.1<br>
&gt; arla<br>
<br>
You have Arla clients?<br>
<br>
&gt; Openafs 1.4.5 via pkgsrc<br>
&gt; replicated root.afs &amp; root.cell RO<br>
&gt; 1000 user per server<br>
&gt;<br>
&gt; 10 servers for fileserver.<br>
&gt;<br>
&gt; 2 servers for vlserver and ptserver<br>
<br>
This is not good. I&#39;ve recently run some tests with 2 DB-servers, and<b=
r>
operation is not optimal. It can take them longer than necessary to<br>
determine the sync site. 3 servers is pretty much ideal, but even a single<=
br>
server works smoother than 2 IMHO.<br>
<br>
&gt; Our users have been experiencing some major lag accessing afs .<br>
&gt; It all began when we had an hardware problem with one of our afs serve=
rs<br>
&gt; (afs-1),accessing afs was laggy for every user on the server<br>
&gt; so we decided to move every one of them from this server to one of the=
 nine<br>
&gt; others,<br>
&gt; we shutdown the broken server take it off the listaddrs list and resta=
rt the<br>
&gt; vlserver instance.<br>
&gt; The slowdown continues..<br>
&gt;<br>
&gt; We turned on the afs-1 server again =A0but without lunch any afs servi=
ces and<br>
&gt; then no more lags accesing afs.<br>
&gt; Since then we&#39;ve had to shutdown afs-1 ,took it off the listaddrs =
,and lags<br>
&gt; are back.<br>
&gt; Note#1 : afs servers are up since a year and we&#39;ve never exeperien=
ced any<br>
&gt; issue before.<br>
&gt; Note#2 : bos status and sysstat doesnt reveal any issue .<br>
&gt; Any guess about the reasons for lags ?<br>
<br>
I presume afs-1 was NOT one of your DB servers. If it is,<br>
CellServDB would be the place to start.<br>
<br>
There may be problems with replicated volumes. root.cell should be cached a=
t<br>
all times (are there frequent vos release&#39;s?) but who knows...<br>
<br>
On afflicted clients, try vos checkv.<br>
<br>
HTH<br>
Felix<br>
</div></div>_______________________________________________<br>
OpenAFS-info mailing list<br>
<a href=3D"mailto:OpenAFS-info@openafs.org">OpenAFS-info@openafs.org</a><br=
>
<a href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" target=
=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info</a><br>
</blockquote></div><br><br>
<br>
&gt;I agree with Abdelkader and would recommend having at least 3 database
servers. =A0You could be walking on very thin ice with just 2.<br>
Whats the reason for this ?<br clear=3D"all"><br>-- <br>Abdelkader El masto=
ur<br>0620477723<br>

--001636c5bb71dec054046568da19--