[OpenAFS] replication issues 1.3.85

Jacob Liff jacobl@ccbill.com
Fri, 22 Jul 2005 15:14:54 -0700


This is a multi-part message in MIME format.

------_=_NextPart_001_01C58F0A.CA14EE25
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

Hello,

=20

            We are running 1.3.85 over here on a 2.6.11.8 kernel. I
originally setup 2 afs servers and was able to get them talking properly
until I tried replication. After having no success(same problems laid
out bellow) I hit Google for a few hours and found out it's best to have
at least three AFS servers. Based on this advice I rebuilt the cluster
adding another server into the game.

=20

All these machines see each other, I can run commands from one to the
others. I have a volume I am trying to replicate onto the other two
servers. To complete this I am doing the following from my first
server(server1.cluster.com.) This is the machine with the original RW
volume/mount I would like to replicate.

=20

/usr/afs/bin/vos addsite server2.cluster.com /vicepa cache

/usr/afs/bin/vos addsite server3.cluster.com /vicepa cache

=20

/usr/afs/bin/vos release cache

=20

I then issue the following to be sure this worked properly. The output
is the same on all three machines.

=20

/usr/afs/bin/vos examine cache

=20

cache                             536870930 RW        445 K  On-line

    server1.cluster.com /vicepa

    RWrite  536870930 ROnly  536870931 Backup          0

    MaxQuota       5000 K

    Creation    Thu Jul 21 15:52:17 2005

    Copy        Thu Jul 21 15:52:17 2005

    Backup      Never

    Last Update Fri Jul 22 14:27:03 2005

    415 accesses in the past day (i.e., vnode references)

=20

    RWrite: 536870930     ROnly: 536870931

    number of sites -> 3

       server server1.cluster.com partition /vicepa RW Site

       server server2.cluster.com partition /vicepa RO Site

       server server3.cluster.com partition /vicepa RO Site

=20

=20

I then check the file log on server2 and server3. It would appear the
volume was exported to them successfully.

=20

Fri Jul 22 14:24:11 2005 fssync: volume 536870931 restored; breaking all
call backs

=20

Just for kicks I run the following on server2 and server3 to make sure
everything is in sync:

=20

/usr/afs/bin/vos syncvldb server1.cluster.com -cell
afscluster.cluster.com -verbose

/usr/afs/bin/vos syncserv server1.cluster.com -cell
afscluster.cluster.com -verbose

=20

I waited about 15 minutes to be sure everything had replicated, the
volume is small for testing purposes. I then simulate a server failure
on server1 by downing the interface. All the connected clients then lag
up and stop working. It takes about 5 minutes then server2 and server3
spew into the logs that server1 is down. The cluster folder which I
could previously browse now turns to a regular file(Linux client) and is
useless.=20

=20

What am I doing wrong here for the replication? According to all the
documentation the cache manager should see that the RW volume is no
longer there and seamlessly pull the files from the two RO replicated
sites. So far I can't replicate anything near this behavior.

=20

Jacob L.

=20


------_=_NextPart_001_01C58F0A.CA14EE25
Content-Type: text/html;
	charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns=3D"http://www.w3.org/TR/REC-html40">

<head>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 11 (filtered medium)">
<style>
<!--
 /* Style Definitions */
 p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman";}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:Arial;
	color:windowtext;}
@page Section1
	{size:8.5in 11.0in;
	margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
	{page:Section1;}
-->
</style>

</head>

<body lang=3DEN-US link=3Dblue vlink=3Dpurple>

<div class=3DSection1>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Hello,<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=
&nbsp;&nbsp;
We are running 1.3.85 over here on a 2.6.11.8 kernel. I originally setup =
2 afs
servers and was able to get them talking properly until I tried =
replication.
After having no success(same problems laid out bellow) I hit Google for =
a few
hours and found out it&#8217;s best to have at least three AFS servers. =
Based
on this advice I rebuilt the cluster adding another server into the =
game.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>All these machines see each other, I can run commands =
from
one to the others. I have a volume I am trying to replicate onto the =
other two
servers. To complete this I am doing the following from my first
server(server1.cluster.com.) This is the machine with the original RW
volume/mount I would like to replicate.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos addsite server2.cluster.com /vicepa =
cache<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos addsite server3.cluster.com /vicepa =
cache<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos release =
cache<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>I then issue the following to be sure this worked =
properly.
The output is the same on all three =
machines.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos examine =
cache<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>cache&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&=
nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
536870930 RW&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 445 K&nbsp; =
On-line<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; server1.cluster.com =
/vicepa<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; RWrite&nbsp; 536870930 ROnly&nbsp;
536870931 Backup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
0<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp;
MaxQuota&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 5000 =
K<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; Creation&nbsp;&nbsp;&nbsp; Thu Jul =
21
15:52:17 2005<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp;
Copy&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Thu Jul 21 15:52:17 =
2005<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; =
Backup&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
Never<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; Last Update Fri Jul 22 14:27:03 =
2005<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; 415 accesses in the past day =
(i.e., vnode
references)<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; RWrite: =
536870930&nbsp;&nbsp;&nbsp;&nbsp;
ROnly: 536870931<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp; number of sites -&gt; =
3<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; server
server1.cluster.com partition /vicepa RW =
Site<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; server
server2.cluster.com partition /vicepa RO =
Site<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; server
server3.cluster.com partition /vicepa RO =
Site<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>I then check the file log on server2 and server3. It =
would
appear the volume was exported to them =
successfully.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Fri Jul 22 14:24:11 2005 fssync: volume 536870931 =
restored;
breaking all call backs<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Just for kicks I run the following on server2 and =
server3 to
make sure everything is in sync:<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos syncvldb server1.cluster.com -cell
afscluster.cluster.com &#8211;verbose<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos syncserv server1.cluster.com -cell
afscluster.cluster.com &#8211;verbose<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>I waited about 15 minutes to be sure everything had
replicated, the volume is small for testing purposes. I then simulate a =
server
failure on server1 by downing the interface. All the connected clients =
then lag
up and stop working. It takes about 5 minutes then server2 and server3 =
spew
into the logs that server1 is down. The cluster folder which I could =
previously
browse now turns to a regular file(Linux client) and is useless. =
<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>What am I doing wrong here for the replication? =
According to
all the documentation the cache manager should see that the RW volume is =
no
longer there and seamlessly pull the files from the two RO replicated =
sites. So
far I can&#8217;t replicate anything near this =
behavior.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Jacob L.<o:p></o:p></span></font></p>

<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p>&nbsp;</o:p></span></font></p>

</div>

</body>

</html>

------_=_NextPart_001_01C58F0A.CA14EE25--