[OpenAFS-devel] "Communication Failure" moving large volume

Chris Kuethe Chris Kuethe <chris.kuethe@gmail.com>
Tue, 17 May 2005 09:43:49 -0600


I've got a repeatable failure moving a volume to a new server: after
43.8GB have been moved, I get the error below. This has left me with a
volume that is pretty much inaccessible - it seems to be there, but
any attempts to access it fail with an "operation not supported by
device" error, or the volume is just not visible. Prior to the move,
the volume was happy at about 60GB for six months.

I'm thinking that this is due to u_int32 * 1024B transfer blocks...
has anyone got any suggestions for getting this to work again

I can reproduce this on i386 openbsd {3.6,3.7} with openafs {1.3.74,1.3.82}

I've already tried the various recovery/salvage suggestions in
http://www.openafs.org/pages/doc/AdminGuide/auagd010.htm

afstest# vos move csk-openafstest afstest.cns.ualberta.ca a
oafs-test1.cns.ualberta.ca a -verbose
Starting transaction on source volume 1870461518 ... done
Allocating new volume id for clone of volume 1870461518 ... done
Cloning source volume 1870461518 ... done
Ending the transaction on the source volume 1870461518 ... done
Starting transaction on the cloned volume 1870487131 ... done
Setting flags on cloned volume 1870487131 ... done
Getting status of cloned volume 1870487131 ... done
Creating the destination volume 1870461518 ... done
Setting volume flags on destination volume 1870461518 ... done
Dumping from clone 1870487131 on source to volume 1870461518 on destination=
 ...
Failed to move data for the volume 1870461518
   Possible communication failure
vos move: operation interrupted, cleanup in progress...
clear transaction contexts
Recovery: Releasing VLDB lock on volume 1870461518 ... done
Recovery: Ending transaction on clone volume ... done
Recovery: Ending transaction on destination volume ... done
Recovery: Accessing VLDB.
move incomplete - attempt cleanup of target partition - no guarantee
Recovery: Creating transaction for destination volume 1870461518 ...
Recovery: Unable to start transaction on destination volume 1870461518.
Recovery: Creating transaction on source volume 1870461518 ... done
Recovery: Setting flags on source volume 1870461518 ... done
Recovery: Ending transaction on source volume 1870461518 ... done
Recovery: Creating transaction on clone volume 1870487131 ... done
Recovery: Deleting clone volume 1870487131 ... done
Recovery: Ending transaction on clone volume 1870487131 ... done
Recovery: Releasing lock on VLDB entry for volume 1870461518 ... done
cleanup complete - user verify desired result
afstest# ls
adsm          dumps         gpu           ludeware      nrs         =20
openafstest   patrian       registration  simon         suspend
bluejay       farm          graphics      msds          omr         =20
ots           plotprev      research      software
afstest# cd openafstest=20
openafstest: Operation not supported by device.
afstest# ls -al
ls: openafstest: Operation not supported by device
total 82
drwxrwxrwx    3 root   wheel    2048 Dec  8 16:38 .
drwxr-xr-x    2 root   wheel    2048 Jul  5  2004 ..
...
afstest# vos listvol afstest
Total number of volumes on server afstest partition /vicepa: 2=20
csk-openafstest                  1870461518 RW   58094424 K On-line
csk-openafstest.backup           1870461520 BK   58094424 K On-line
afstest# vos examine csk-openafstest -extended
csk-openafstest                  1870461518 RW   58094424 K used 7934
files On-line
    afstest.cns.ualberta.ca /vicepa=20
    RWrite 1870461518 ROnly 1870487131 Backup 1870461520=20
    MaxQuota          0 K=20
    Creation    Wed Dec  8 16:21:00 2004
    Copy        Wed Dec  8 16:21:00 2004
    Backup      Sun May 15 17:00:44 2005
    Last Update Tue Apr 26 10:33:38 2005
    0 accesses in the past day (i.e., vnode references)

                      Raw Read/Write Stats
          |-------------------------------------------|
          |    Same Network     |    Diff Network     |
          |----------|----------|----------|----------|
          |  Total   |   Auth   |   Total  |   Auth   |
          |----------|----------|----------|----------|
Reads     |      232 |       36 |        1 |        0 |
Writes    |        0 |        0 |        0 |        0 |
          |-------------------------------------------|

                   Writes Affecting Authorship
          |-------------------------------------------|
          |   File Authorship   | Directory Authorship|
          |----------|----------|----------|----------|
          |   Same   |   Diff   |    Same  |   Diff   |
          |----------|----------|----------|----------|
0-60 sec  |        0 |        0 |        0 |        0 |
1-10 min  |        0 |        0 |        0 |        0 |
10min-1hr |        0 |        0 |        0 |        0 |
1hr-1day  |        0 |        0 |        0 |        0 |
1day-1wk  |        0 |        0 |        0 |        0 |
> 1wk     |        0 |        0 |        0 |        0 |
          |-------------------------------------------|

    RWrite: 1870461518    Backup: 1870461520
    number of sites -> 1
       server afstest.cns.ualberta.ca partition /vicepa RW Site  -- New rel=
ease


--=20
GDB has a 'break' feature; why doesn't it have 'fix' too?