[OpenAFS] Lost RW Volume Recovery?

Hartmut Reuter reuter@rzg.mpg.de
Wed, 20 Feb 2008 09:47:15 +0100


Robert Sturrock wrote:
> Hi all.
> 
> I'm not sure how, but we have lost the RW volume for our cell.user (a
> structural volume under which live user home areas).  After a bit of
> searching, I found this thread that describes a possible recovery
> method involving dump/restoring from an RO and then salvaging:
> 
>     http://www.openafs.org/pipermail/openafs-info/2002-December/007228.html
> 
> I tried this method and it _seemed_ to work, but I'm still having
> problems accessing the volume after remounting it.  A quick rundown on
> what I did:
> 
> 
>     $ vos dump cell.user.readonly > cell.user.dump
> 
>     $ vos restore hermes2 a cell.user -verbose < cell.user.dump 
>     Restoring volume cell.user Id 536870918 on server hermes2.its.unimelb.edu.au partition /vicepa .. done
>     Updating the existing VLDB entry
>     ------- Old entry -------
> 
>     cell.user 
> 	ROnly: 536870919 
> 	number of sites -> 2
> 	   server hermes1.its.unimelb.edu.au partition /vicepa RO Site 
> 	   server telos.its.unimelb.edu.au partition /vicepa RO Site 
>     ------- New entry -------
> 
>     cell.user 
> 	RWrite: 536870918     ROnly: 536870919 
> 	number of sites -> 3
> 	   server hermes1.its.unimelb.edu.au partition /vicepa RO Site 
> 	   server telos.its.unimelb.edu.au partition /vicepa RO Site 
> 	   server hermes2.its.unimelb.edu.au partition /vicepa RW Site 
>     Restored volume cell.user on hermes2 /vicepa
> 
>     $ bos salvage hermes2 a cell.user
>     Starting salvage.
>     bos: salvage completed
> 
>     $ bos salvage hermes2 a cell.user
>     Starting salvage.
>     bos: salvage completed
> 
> .. but now the problem is as follows:
> 
>    $ fs mkmount /afs/.athena.unimelb.edu.au/user cell.user
> 
>    [ so far, so good .. but .. ]
> 
>    $ ls -ld user
>    ls: user: Connection timed out
> 
>    $ ls -l
>    total 14
>    drwxrwxrwx 5 root root 2048 Nov 14 15:36 arch
>    drwxrwxrwx 5 root root 2048 Feb 19 09:33 devlp
>    drwxrwxrwx 2 root root 2048 Jan 23 14:44 group
>    drwxrwxrwx 3 root root 2048 Oct 25 12:15 project
>    drwxrwxrwx 4 root root 2048 Oct 15 11:13 pub
>    drwxrwxrwx 2 root root 2048 Feb 20 12:38 tmp
>    ?--------- ? ?    ?       ?            ? user
>    drwxrwxrwx 2 root root 2048 Oct  4 21:06 www  
> 
> Any pointers as to where I go from here?
> 
> The only thing I can think of is that there may be some caching going on
> which in some way is still looking for the old RW volume.
> 
> One alternative might be to "vos convertROtoRW", but I suspect that would
> leave me with the same problem to solve.
> 
> Regards,
> 
> Robert.
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
You need a "fs checkvol" on the client because the disappearing of the 
old volume didn't the callbacks needed to provoke a new vldb lookup on 
the clients. The same problem you have after a "vos convertROtoRW ...".

Hartmut

-- 
-----------------------------------------------------------------
Hartmut Reuter                           e-mail reuter@rzg.mpg.de
					   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------