[OpenAFS] replication issues 1.3.85
Jacob Liff
jacobl@ccbill.com
Fri, 22 Jul 2005 15:14:54 -0700
This is a multi-part message in MIME format.
------_=_NextPart_001_01C58F0A.CA14EE25
Content-Type: text/plain;
charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Hello,
=20
We are running 1.3.85 over here on a 2.6.11.8 kernel. I
originally setup 2 afs servers and was able to get them talking properly
until I tried replication. After having no success(same problems laid
out bellow) I hit Google for a few hours and found out it's best to have
at least three AFS servers. Based on this advice I rebuilt the cluster
adding another server into the game.
=20
All these machines see each other, I can run commands from one to the
others. I have a volume I am trying to replicate onto the other two
servers. To complete this I am doing the following from my first
server(server1.cluster.com.) This is the machine with the original RW
volume/mount I would like to replicate.
=20
/usr/afs/bin/vos addsite server2.cluster.com /vicepa cache
/usr/afs/bin/vos addsite server3.cluster.com /vicepa cache
=20
/usr/afs/bin/vos release cache
=20
I then issue the following to be sure this worked properly. The output
is the same on all three machines.
=20
/usr/afs/bin/vos examine cache
=20
cache 536870930 RW 445 K On-line
server1.cluster.com /vicepa
RWrite 536870930 ROnly 536870931 Backup 0
MaxQuota 5000 K
Creation Thu Jul 21 15:52:17 2005
Copy Thu Jul 21 15:52:17 2005
Backup Never
Last Update Fri Jul 22 14:27:03 2005
415 accesses in the past day (i.e., vnode references)
=20
RWrite: 536870930 ROnly: 536870931
number of sites -> 3
server server1.cluster.com partition /vicepa RW Site
server server2.cluster.com partition /vicepa RO Site
server server3.cluster.com partition /vicepa RO Site
=20
=20
I then check the file log on server2 and server3. It would appear the
volume was exported to them successfully.
=20
Fri Jul 22 14:24:11 2005 fssync: volume 536870931 restored; breaking all
call backs
=20
Just for kicks I run the following on server2 and server3 to make sure
everything is in sync:
=20
/usr/afs/bin/vos syncvldb server1.cluster.com -cell
afscluster.cluster.com -verbose
/usr/afs/bin/vos syncserv server1.cluster.com -cell
afscluster.cluster.com -verbose
=20
I waited about 15 minutes to be sure everything had replicated, the
volume is small for testing purposes. I then simulate a server failure
on server1 by downing the interface. All the connected clients then lag
up and stop working. It takes about 5 minutes then server2 and server3
spew into the logs that server1 is down. The cluster folder which I
could previously browse now turns to a regular file(Linux client) and is
useless.=20
=20
What am I doing wrong here for the replication? According to all the
documentation the cache manager should see that the RW volume is no
longer there and seamlessly pull the files from the two RO replicated
sites. So far I can't replicate anything near this behavior.
=20
Jacob L.
=20
------_=_NextPart_001_01C58F0A.CA14EE25
Content-Type: text/html;
charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
<html xmlns:o=3D"urn:schemas-microsoft-com:office:office" =
xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns=3D"http://www.w3.org/TR/REC-html40">
<head>
<META HTTP-EQUIV=3D"Content-Type" CONTENT=3D"text/html; =
charset=3Dus-ascii">
<meta name=3DGenerator content=3D"Microsoft Word 11 (filtered medium)">
<style>
<!--
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman";}
a:link, span.MsoHyperlink
{color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:Arial;
color:windowtext;}
@page Section1
{size:8.5in 11.0in;
margin:1.0in 1.25in 1.0in 1.25in;}
div.Section1
{page:Section1;}
-->
</style>
</head>
<body lang=3DEN-US link=3Dblue vlink=3Dpurple>
<div class=3DSection1>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Hello,<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> =
We are running 1.3.85 over here on a 2.6.11.8 kernel. I originally setup =
2 afs
servers and was able to get them talking properly until I tried =
replication.
After having no success(same problems laid out bellow) I hit Google for =
a few
hours and found out it’s best to have at least three AFS servers. =
Based
on this advice I rebuilt the cluster adding another server into the =
game.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>All these machines see each other, I can run commands =
from
one to the others. I have a volume I am trying to replicate onto the =
other two
servers. To complete this I am doing the following from my first
server(server1.cluster.com.) This is the machine with the original RW
volume/mount I would like to replicate.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos addsite server2.cluster.com /vicepa =
cache<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos addsite server3.cluster.com /vicepa =
cache<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos release =
cache<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>I then issue the following to be sure this worked =
properly.
The output is the same on all three =
machines.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos examine =
cache<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>cache &=
nbsp; &n=
bsp;
536870930 RW 445 K =
On-line<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> server1.cluster.com =
/vicepa<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> RWrite 536870930 ROnly
536870931 Backup =
0<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>
MaxQuota 5000 =
K<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> Creation Thu Jul =
21
15:52:17 2005<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>
Copy Thu Jul 21 15:52:17 =
2005<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> =
Backup
Never<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> Last Update Fri Jul 22 14:27:03 =
2005<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> 415 accesses in the past day =
(i.e., vnode
references)<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> RWrite: =
536870930
ROnly: 536870931<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> number of sites -> =
3<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> server
server1.cluster.com partition /vicepa RW =
Site<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> server
server2.cluster.com partition /vicepa RO =
Site<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'> server
server3.cluster.com partition /vicepa RO =
Site<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>I then check the file log on server2 and server3. It =
would
appear the volume was exported to them =
successfully.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Fri Jul 22 14:24:11 2005 fssync: volume 536870931 =
restored;
breaking all call backs<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Just for kicks I run the following on server2 and =
server3 to
make sure everything is in sync:<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos syncvldb server1.cluster.com -cell
afscluster.cluster.com –verbose<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>/usr/afs/bin/vos syncserv server1.cluster.com -cell
afscluster.cluster.com –verbose<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>I waited about 15 minutes to be sure everything had
replicated, the volume is small for testing purposes. I then simulate a =
server
failure on server1 by downing the interface. All the connected clients =
then lag
up and stop working. It takes about 5 minutes then server2 and server3 =
spew
into the logs that server1 is down. The cluster folder which I could =
previously
browse now turns to a regular file(Linux client) and is useless. =
<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>What am I doing wrong here for the replication? =
According to
all the documentation the cache manager should see that the RW volume is =
no
longer there and seamlessly pull the files from the two RO replicated =
sites. So
far I can’t replicate anything near this =
behavior.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'>Jacob L.<o:p></o:p></span></font></p>
<p class=3DMsoNormal><font size=3D2 face=3DArial><span =
style=3D'font-size:10.0pt;
font-family:Arial'><o:p> </o:p></span></font></p>
</div>
</body>
</html>
------_=_NextPart_001_01C58F0A.CA14EE25--