[OpenAFS] Lost RW Volume Recovery?

Robert Sturrock rns@unimelb.edu.au
Wed, 20 Feb 2008 16:44:20 +1100


Hi all.

I'm not sure how, but we have lost the RW volume for our cell.user (a
structural volume under which live user home areas).  After a bit of
searching, I found this thread that describes a possible recovery
method involving dump/restoring from an RO and then salvaging:

    http://www.openafs.org/pipermail/openafs-info/2002-December/007228.html

I tried this method and it _seemed_ to work, but I'm still having
problems accessing the volume after remounting it.  A quick rundown on
what I did:


    $ vos dump cell.user.readonly > cell.user.dump

    $ vos restore hermes2 a cell.user -verbose < cell.user.dump 
    Restoring volume cell.user Id 536870918 on server hermes2.its.unimelb.edu.au partition /vicepa .. done
    Updating the existing VLDB entry
    ------- Old entry -------

    cell.user 
	ROnly: 536870919 
	number of sites -> 2
	   server hermes1.its.unimelb.edu.au partition /vicepa RO Site 
	   server telos.its.unimelb.edu.au partition /vicepa RO Site 
    ------- New entry -------

    cell.user 
	RWrite: 536870918     ROnly: 536870919 
	number of sites -> 3
	   server hermes1.its.unimelb.edu.au partition /vicepa RO Site 
	   server telos.its.unimelb.edu.au partition /vicepa RO Site 
	   server hermes2.its.unimelb.edu.au partition /vicepa RW Site 
    Restored volume cell.user on hermes2 /vicepa

    $ bos salvage hermes2 a cell.user
    Starting salvage.
    bos: salvage completed

    $ bos salvage hermes2 a cell.user
    Starting salvage.
    bos: salvage completed

.. but now the problem is as follows:

   $ fs mkmount /afs/.athena.unimelb.edu.au/user cell.user

   [ so far, so good .. but .. ]

   $ ls -ld user
   ls: user: Connection timed out

   $ ls -l
   total 14
   drwxrwxrwx 5 root root 2048 Nov 14 15:36 arch
   drwxrwxrwx 5 root root 2048 Feb 19 09:33 devlp
   drwxrwxrwx 2 root root 2048 Jan 23 14:44 group
   drwxrwxrwx 3 root root 2048 Oct 25 12:15 project
   drwxrwxrwx 4 root root 2048 Oct 15 11:13 pub
   drwxrwxrwx 2 root root 2048 Feb 20 12:38 tmp
   ?--------- ? ?    ?       ?            ? user
   drwxrwxrwx 2 root root 2048 Oct  4 21:06 www  

Any pointers as to where I go from here?

The only thing I can think of is that there may be some caching going on
which in some way is still looking for the old RW volume.

One alternative might be to "vos convertROtoRW", but I suspect that would
leave me with the same problem to solve.

Regards,

Robert.