[OpenAFS] resurrecting dead server
Neil Davies
semanticphilosopher@gmail.com
Sun, 16 Jan 2011 19:25:14 +0000
--Apple-Mail-58--928865589
Content-Type: text/plain;
charset=US-ASCII;
format=flowed;
delsp=yes
Content-Transfer-Encoding: 7bit
I'd follow Jason's suggestion
I've had to do this twice in the 8/9 years we been running AFS. In
our case we would (quickly) rebuild a physical sever, get the raid
attached, mount them as /vicep(x) partitions then perform the VLDB
sync ritual - there may be some old addresses and references in the
VLDB that need tiding up later.
Don't move the databases - that will move the sysid file which could
cause more pain.
You can run with two out of three database servers for a little while
- though i've found that there is the occasional delay in some
operations in this mode as the distributed leadership election between
the remaining two servers seems to kick in more oftern
Neil
On 16 Jan 2011, at 18:48, Ted Creedon wrote:
> How about moving the /usr/afs directories from a known good server
> to the RW server and doing the same?
>
> ted
>
> On Sun, Jan 16, 2011 at 7:10 AM, Jason Edgecombe <jason@rampaginggeek.com
> > wrote:
> On 01/15/2011 01:05 PM, Ted Creedon wrote:
> my R/W server died but /vicepa and /vicepb were saved on their raid
> drives.
> this server also ran the krb5kdc...
>
> there are 2 other RO servers still OK but 2 out of 20 volumes are
> not up to
> date. both were running upclient etc
>
> the data on the RO servers is:
> /usr/afs/db:
> total 316
> drwx------ 2 root root 4096 May 5 2009 ./
> drwxr-xr-x 7 root root 4096 May 6 2009 ../
> -rw------- 1 root root 1088 May 2 2009 bdb.DB0
> -rw------- 1 root root 64 Jan 7 12:18 bdb.DBSYS1
> -rw------- 1 root root 68672 May 5 2009 prdb.DB0
> -rw------- 1 root root 68672 May 5 2009 prdb.DB0.bak
> -rw------- 1 root root 64 Jan 7 12:18 prdb.DBSYS1
> -rw------- 1 root root 64 May 5 2009 prdb.DBSYS1.bak
> -rw------- 1 root root 144448 Jan 7 12:18 vldb.DB0
> -rw------- 1 root root 64 Jan 7 13:50 vldb.DBSYS1
>
> /usr/afs/etc:
> total 80
> drwxr-xr-x 2 root root 4096 Jun 16 2010 ./
> drwxr-xr-x 7 root root 4096 May 6 2009 ../
> -rw-r--r-- 1 root root 162 Jan 7 12:17 CellServDB
> -rw------- 1 root root 100 May 6 2009 KeyFile
> -rw-r--r-- 1 root root 10 May 2 2009 NetRestrict
> -rw-r--r-- 1 root root 11 May 6 2009 ThisCell
> -rw-r--r-- 1 root root 39 Nov 15 2008 UserList
>
> /usr/afs/local:
> total 24
> drwx------ 2 root root 4096 Jan 7 13:53 ./
> drwxr-xr-x 7 root root 4096 May 6 2009 ../
> -rw-r--r-- 1 root root 313 Apr 30 2009 BosConfig
> -rw-r--r-- 1 root root 10 May 6 2009 NetRestrict
> -rw-r--r-- 1 root root 0 Jan 7 13:53 SALVAGE.fs
> srwxr-xr-x 1 root root 0 Jan 7 12:22 fssync.sock=
> -rw-r--r-- 1 root root 0 May 2 2009 salvage.lock
> -rw-r--r-- 1 root root 32 Jan 7 11:55 sysid
> -rw-r--r-- 1 root root 32 Jan 7 11:00 sysid.old
>
>
> what's the best way to proceed after I rekey
>
> PS I also have vos dump files but the 2 aforementiined volumes are
> not up to
> date.
>
> thanks
>
> tedc
>
> I suggest connecting the raid drives to another server, possibly one
> of your R/O servers. Mount the /vicepX partitions, restart the
> fileserver, then run "vos syncserv" and "vos syncvldb" to have the
> recovered volumes be remapped to the different server.
>
> Jason
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>
--Apple-Mail-58--928865589
Content-Type: text/html;
charset=US-ASCII
Content-Transfer-Encoding: quoted-printable
<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; ">I'd follow Jason's =
suggestion<div><br></div><div>I've had to do this twice in the 8/9 years =
we been running AFS. In our case we would (quickly) rebuild a =
physical sever, get the raid attached, mount them as /vicep(x) =
partitions then perform the VLDB sync ritual - there may be some old =
addresses and references in the VLDB that need tiding up =
later.</div><div><br></div><div>Don't move the databases - that will =
move the sysid file which could cause more =
pain.</div><div><br></div><div>You can run with two out of three =
database servers for a little while - though i've found that there is =
the occasional delay in some operations in this mode as the distributed =
leadership election between the remaining two servers seems to kick in =
more =
oftern</div><div><br></div><div>Neil</div><div><br></div><div><br><div><di=
v>On 16 Jan 2011, at 18:48, Ted Creedon wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite">How about =
moving the /usr/afs directories from a known good server to the RW =
server and doing the same?<br><br>ted<br><br><div class=3D"gmail_quote">On=
Sun, Jan 16, 2011 at 7:10 AM, Jason Edgecombe <span dir=3D"ltr"><<a =
href=3D"mailto:jason@rampaginggeek.com">jason@rampaginggeek.com</a>></s=
pan> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt =
0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: =
1ex;"><div><div></div><div class=3D"h5">On 01/15/2011 01:05 PM, Ted =
Creedon wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin: =
0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); =
padding-left: 1ex;"> my R/W server died but /vicepa and /vicepb were =
saved on their raid drives.<br> this server also ran the krb5kdc...<br> =
<br> there are 2 other RO servers still OK but 2 out of 20 volumes are =
not up to<br> date. both were running upclient etc<br> <br> the data on =
the RO servers is:<br> /usr/afs/db:<br> total 316<br> drwx------ 2 root =
root 4096 May 5 2009 ./<br> drwxr-xr-x 7 root root =
4096 May 6 2009 ../<br> -rw------- 1 root root =
1088 May 2 2009 bdb.DB0<br> -rw------- 1 root root =
64 Jan 7 12:18 bdb.DBSYS1<br> -rw------- 1 root root =
68672 May 5 2009 prdb.DB0<br> -rw------- 1 root root =
68672 May 5 2009 prdb.DB0.bak<br> -rw------- 1 root =
root 64 Jan 7 12:18 prdb.DBSYS1<br> -rw------- 1 =
root root 64 May 5 2009 prdb.DBSYS1.bak<br> =
-rw------- 1 root root 144448 Jan 7 12:18 vldb.DB0<br> -rw------- =
1 root root 64 Jan 7 13:50 vldb.DBSYS1<br> <br> =
/usr/afs/etc:<br> total 80<br> drwxr-xr-x 2 root root 4096 Jun 16 =
2010 ./<br> drwxr-xr-x 7 root root 4096 May 6 =
2009 ../<br> -rw-r--r-- 1 root root 162 Jan 7 12:17 =
CellServDB<br> -rw------- 1 root root 100 May 6 2009 =
KeyFile<br> -rw-r--r-- 1 root root 10 May 2 =
2009 NetRestrict<br> -rw-r--r-- 1 root root 11 May =
6 2009 ThisCell<br> -rw-r--r-- 1 root root 39 =
Nov 15 2008 UserList<br> <br> /usr/afs/local:<br> total 24<br> =
drwx------ 2 root root 4096 Jan 7 13:53 ./<br> drwxr-xr-x 7 root =
root 4096 May 6 2009 ../<br> -rw-r--r-- 1 root root =
313 Apr 30 2009 BosConfig<br> -rw-r--r-- 1 root root =
10 May 6 2009 NetRestrict<br> -rw-r--r-- 1 root root =
0 Jan 7 13:53 SALVAGE.fs<br> srwxr-xr-x 1 root root =
0 Jan 7 12:22 fssync.sock=3D<br> -rw-r--r-- 1 root root =
0 May 2 2009 salvage.lock<br> -rw-r--r-- 1 root =
root 32 Jan 7 11:55 sysid<br> -rw-r--r-- 1 root root =
32 Jan 7 11:00 sysid.old<br> <br> <br> what's the best way to =
proceed after I rekey<br> <br> PS I also have vos dump files but the 2 =
aforementiined volumes are not up to<br> date.<br> <br> thanks<br> <br> =
tedc<br> <br> </blockquote></div></div> I suggest connecting the raid =
drives to another server, possibly one of your R/O servers. Mount the =
/vicepX partitions, restart the fileserver, then run "vos syncserv" and =
"vos syncvldb" to have the recovered volumes be remapped to the =
different server.<br> <br> Jason<br> =
_______________________________________________<br> OpenAFS-info mailing =
list<br> <a href=3D"mailto:OpenAFS-info@openafs.org" =
target=3D"_blank">OpenAFS-info@openafs.org</a><br> <a =
href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" =
target=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info<=
/a><br> =
</blockquote></div><br></blockquote></div><br></div></body></html>=
--Apple-Mail-58--928865589--