[OpenAFS] resurrecting dead server

Neil Davies semanticphilosopher@gmail.com
Sun, 16 Jan 2011 19:25:14 +0000


--Apple-Mail-58--928865589
Content-Type: text/plain;
	charset=US-ASCII;
	format=flowed;
	delsp=yes
Content-Transfer-Encoding: 7bit

I'd follow Jason's suggestion

I've had to do this twice in the 8/9 years we been running AFS.  In  
our case we would (quickly) rebuild a physical sever, get the raid  
attached, mount them as /vicep(x) partitions then perform the VLDB  
sync ritual - there may be some old addresses and references in the  
VLDB that need tiding up later.

Don't move the databases - that will move the sysid file which could  
cause more pain.

You can run with two out of three database servers for a little while  
- though i've found that there is the occasional delay in some  
operations in this mode as the distributed leadership election between  
the remaining two servers seems to kick in more oftern

Neil


On 16 Jan 2011, at 18:48, Ted Creedon wrote:

> How about moving the /usr/afs directories from a known good server  
> to the RW server and doing the same?
>
> ted
>
> On Sun, Jan 16, 2011 at 7:10 AM, Jason Edgecombe <jason@rampaginggeek.com 
> > wrote:
> On 01/15/2011 01:05 PM, Ted Creedon wrote:
> my R/W server died but /vicepa and /vicepb were saved on their raid  
> drives.
> this server also ran the krb5kdc...
>
> there are 2 other RO servers still OK but 2 out of 20 volumes are  
> not up to
> date. both were running upclient etc
>
> the data on the RO servers is:
> /usr/afs/db:
> total 316
> drwx------ 2 root root   4096 May  5  2009 ./
> drwxr-xr-x 7 root root   4096 May  6  2009 ../
> -rw------- 1 root root   1088 May  2  2009 bdb.DB0
> -rw------- 1 root root     64 Jan  7 12:18 bdb.DBSYS1
> -rw------- 1 root root  68672 May  5  2009 prdb.DB0
> -rw------- 1 root root  68672 May  5  2009 prdb.DB0.bak
> -rw------- 1 root root     64 Jan  7 12:18 prdb.DBSYS1
> -rw------- 1 root root     64 May  5  2009 prdb.DBSYS1.bak
> -rw------- 1 root root 144448 Jan  7 12:18 vldb.DB0
> -rw------- 1 root root     64 Jan  7 13:50 vldb.DBSYS1
>
> /usr/afs/etc:
> total 80
> drwxr-xr-x 2 root root  4096 Jun 16  2010 ./
> drwxr-xr-x 7 root root  4096 May  6  2009 ../
> -rw-r--r-- 1 root root   162 Jan  7 12:17 CellServDB
> -rw------- 1 root root   100 May  6  2009 KeyFile
> -rw-r--r-- 1 root root    10 May  2  2009 NetRestrict
> -rw-r--r-- 1 root root    11 May  6  2009 ThisCell
> -rw-r--r-- 1 root root    39 Nov 15  2008 UserList
>
> /usr/afs/local:
> total 24
> drwx------ 2 root root 4096 Jan  7 13:53 ./
> drwxr-xr-x 7 root root 4096 May  6  2009 ../
> -rw-r--r-- 1 root root  313 Apr 30  2009 BosConfig
> -rw-r--r-- 1 root root   10 May  6  2009 NetRestrict
> -rw-r--r-- 1 root root    0 Jan  7 13:53 SALVAGE.fs
> srwxr-xr-x 1 root root    0 Jan  7 12:22 fssync.sock=
> -rw-r--r-- 1 root root    0 May  2  2009 salvage.lock
> -rw-r--r-- 1 root root   32 Jan  7 11:55 sysid
> -rw-r--r-- 1 root root   32 Jan  7 11:00 sysid.old
>
>
> what's the best way to proceed after I rekey
>
> PS I also have vos dump files but the 2 aforementiined volumes are  
> not up to
> date.
>
> thanks
>
> tedc
>
> I suggest connecting the raid drives to another server, possibly one  
> of your R/O servers. Mount the /vicepX partitions, restart the  
> fileserver, then run "vos syncserv" and "vos syncvldb" to have the  
> recovered volumes be remapped to the different server.
>
> Jason
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>


--Apple-Mail-58--928865589
Content-Type: text/html;
	charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

<html><body style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; ">I'd follow Jason's =
suggestion<div><br></div><div>I've had to do this twice in the 8/9 years =
we been running AFS. &nbsp;In our case we would (quickly) rebuild a =
physical sever, get the raid attached, mount them as /vicep(x) =
partitions then perform the VLDB sync ritual - there may be some old =
addresses and references in the VLDB that need tiding up =
later.</div><div><br></div><div>Don't move the databases - that will =
move the sysid file which could cause more =
pain.</div><div><br></div><div>You can run with two out of three =
database servers for a little while - though i've found that there is =
the occasional delay in some operations in this mode as the distributed =
leadership election between the remaining two servers seems to kick in =
more =
oftern</div><div><br></div><div>Neil</div><div><br></div><div><br><div><di=
v>On 16 Jan 2011, at 18:48, Ted Creedon wrote:</div><br =
class=3D"Apple-interchange-newline"><blockquote type=3D"cite">How about =
moving the /usr/afs directories from a known good server to the RW =
server and doing the same?<br><br>ted<br><br><div class=3D"gmail_quote">On=
 Sun, Jan 16, 2011 at 7:10 AM, Jason Edgecombe <span dir=3D"ltr">&lt;<a =
href=3D"mailto:jason@rampaginggeek.com">jason@rampaginggeek.com</a>&gt;</s=
pan> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin: 0pt =
0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: =
1ex;"><div><div></div><div class=3D"h5">On 01/15/2011 01:05 PM, Ted =
Creedon wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin: =
0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); =
padding-left: 1ex;"> my R/W server died but /vicepa and /vicepb were =
saved on their raid drives.<br> this server also ran the krb5kdc...<br> =
<br> there are 2 other RO servers still OK but 2 out of 20 volumes are =
not up to<br> date. both were running upclient etc<br> <br> the data on =
the RO servers is:<br> /usr/afs/db:<br> total 316<br> drwx------ 2 root =
root &nbsp; 4096 May &nbsp;5 &nbsp;2009 ./<br> drwxr-xr-x 7 root root =
&nbsp; 4096 May &nbsp;6 &nbsp;2009 ../<br> -rw------- 1 root root &nbsp; =
1088 May &nbsp;2 &nbsp;2009 bdb.DB0<br> -rw------- 1 root root &nbsp; =
&nbsp; 64 Jan &nbsp;7 12:18 bdb.DBSYS1<br> -rw------- 1 root root =
&nbsp;68672 May &nbsp;5 &nbsp;2009 prdb.DB0<br> -rw------- 1 root root =
&nbsp;68672 May &nbsp;5 &nbsp;2009 prdb.DB0.bak<br> -rw------- 1 root =
root &nbsp; &nbsp; 64 Jan &nbsp;7 12:18 prdb.DBSYS1<br> -rw------- 1 =
root root &nbsp; &nbsp; 64 May &nbsp;5 &nbsp;2009 prdb.DBSYS1.bak<br> =
-rw------- 1 root root 144448 Jan &nbsp;7 12:18 vldb.DB0<br> -rw------- =
1 root root &nbsp; &nbsp; 64 Jan &nbsp;7 13:50 vldb.DBSYS1<br> <br> =
/usr/afs/etc:<br> total 80<br> drwxr-xr-x 2 root root &nbsp;4096 Jun 16 =
&nbsp;2010 ./<br> drwxr-xr-x 7 root root &nbsp;4096 May &nbsp;6 =
&nbsp;2009 ../<br> -rw-r--r-- 1 root root &nbsp; 162 Jan &nbsp;7 12:17 =
CellServDB<br> -rw------- 1 root root &nbsp; 100 May &nbsp;6 &nbsp;2009 =
KeyFile<br> -rw-r--r-- 1 root root &nbsp; &nbsp;10 May &nbsp;2 =
&nbsp;2009 NetRestrict<br> -rw-r--r-- 1 root root &nbsp; &nbsp;11 May =
&nbsp;6 &nbsp;2009 ThisCell<br> -rw-r--r-- 1 root root &nbsp; &nbsp;39 =
Nov 15 &nbsp;2008 UserList<br> <br> /usr/afs/local:<br> total 24<br> =
drwx------ 2 root root 4096 Jan &nbsp;7 13:53 ./<br> drwxr-xr-x 7 root =
root 4096 May &nbsp;6 &nbsp;2009 ../<br> -rw-r--r-- 1 root root =
&nbsp;313 Apr 30 &nbsp;2009 BosConfig<br> -rw-r--r-- 1 root root &nbsp; =
10 May &nbsp;6 &nbsp;2009 NetRestrict<br> -rw-r--r-- 1 root root &nbsp; =
&nbsp;0 Jan &nbsp;7 13:53 SALVAGE.fs<br> srwxr-xr-x 1 root root &nbsp; =
&nbsp;0 Jan &nbsp;7 12:22 fssync.sock=3D<br> -rw-r--r-- 1 root root =
&nbsp; &nbsp;0 May &nbsp;2 &nbsp;2009 salvage.lock<br> -rw-r--r-- 1 root =
root &nbsp; 32 Jan &nbsp;7 11:55 sysid<br> -rw-r--r-- 1 root root &nbsp; =
32 Jan &nbsp;7 11:00 sysid.old<br> <br> <br> what's the best way to =
proceed after I rekey<br> <br> PS I also have vos dump files but the 2 =
aforementiined volumes are not up to<br> date.<br> <br> thanks<br> <br> =
tedc<br> <br> </blockquote></div></div> I suggest connecting the raid =
drives to another server, possibly one of your R/O servers. Mount the =
/vicepX partitions, restart the fileserver, then run "vos syncserv" and =
"vos syncvldb" to have the recovered volumes be remapped to the =
different server.<br> <br> Jason<br> =
_______________________________________________<br> OpenAFS-info mailing =
list<br> <a href=3D"mailto:OpenAFS-info@openafs.org" =
target=3D"_blank">OpenAFS-info@openafs.org</a><br> <a =
href=3D"https://lists.openafs.org/mailman/listinfo/openafs-info" =
target=3D"_blank">https://lists.openafs.org/mailman/listinfo/openafs-info<=
/a><br> =
</blockquote></div><br></blockquote></div><br></div></body></html>=

--Apple-Mail-58--928865589--