[OpenAFS-devel] "Communication Failure" moving large volume
Chris Kuethe
Chris Kuethe <chris.kuethe@gmail.com>
Tue, 17 May 2005 09:43:49 -0600
I've got a repeatable failure moving a volume to a new server: after
43.8GB have been moved, I get the error below. This has left me with a
volume that is pretty much inaccessible - it seems to be there, but
any attempts to access it fail with an "operation not supported by
device" error, or the volume is just not visible. Prior to the move,
the volume was happy at about 60GB for six months.
I'm thinking that this is due to u_int32 * 1024B transfer blocks...
has anyone got any suggestions for getting this to work again
I can reproduce this on i386 openbsd {3.6,3.7} with openafs {1.3.74,1.3.82}
I've already tried the various recovery/salvage suggestions in
http://www.openafs.org/pages/doc/AdminGuide/auagd010.htm
afstest# vos move csk-openafstest afstest.cns.ualberta.ca a
oafs-test1.cns.ualberta.ca a -verbose
Starting transaction on source volume 1870461518 ... done
Allocating new volume id for clone of volume 1870461518 ... done
Cloning source volume 1870461518 ... done
Ending the transaction on the source volume 1870461518 ... done
Starting transaction on the cloned volume 1870487131 ... done
Setting flags on cloned volume 1870487131 ... done
Getting status of cloned volume 1870487131 ... done
Creating the destination volume 1870461518 ... done
Setting volume flags on destination volume 1870461518 ... done
Dumping from clone 1870487131 on source to volume 1870461518 on destination=
...
Failed to move data for the volume 1870461518
Possible communication failure
vos move: operation interrupted, cleanup in progress...
clear transaction contexts
Recovery: Releasing VLDB lock on volume 1870461518 ... done
Recovery: Ending transaction on clone volume ... done
Recovery: Ending transaction on destination volume ... done
Recovery: Accessing VLDB.
move incomplete - attempt cleanup of target partition - no guarantee
Recovery: Creating transaction for destination volume 1870461518 ...
Recovery: Unable to start transaction on destination volume 1870461518.
Recovery: Creating transaction on source volume 1870461518 ... done
Recovery: Setting flags on source volume 1870461518 ... done
Recovery: Ending transaction on source volume 1870461518 ... done
Recovery: Creating transaction on clone volume 1870487131 ... done
Recovery: Deleting clone volume 1870487131 ... done
Recovery: Ending transaction on clone volume 1870487131 ... done
Recovery: Releasing lock on VLDB entry for volume 1870461518 ... done
cleanup complete - user verify desired result
afstest# ls
adsm dumps gpu ludeware nrs =20
openafstest patrian registration simon suspend
bluejay farm graphics msds omr =20
ots plotprev research software
afstest# cd openafstest=20
openafstest: Operation not supported by device.
afstest# ls -al
ls: openafstest: Operation not supported by device
total 82
drwxrwxrwx 3 root wheel 2048 Dec 8 16:38 .
drwxr-xr-x 2 root wheel 2048 Jul 5 2004 ..
...
afstest# vos listvol afstest
Total number of volumes on server afstest partition /vicepa: 2=20
csk-openafstest 1870461518 RW 58094424 K On-line
csk-openafstest.backup 1870461520 BK 58094424 K On-line
afstest# vos examine csk-openafstest -extended
csk-openafstest 1870461518 RW 58094424 K used 7934
files On-line
afstest.cns.ualberta.ca /vicepa=20
RWrite 1870461518 ROnly 1870487131 Backup 1870461520=20
MaxQuota 0 K=20
Creation Wed Dec 8 16:21:00 2004
Copy Wed Dec 8 16:21:00 2004
Backup Sun May 15 17:00:44 2005
Last Update Tue Apr 26 10:33:38 2005
0 accesses in the past day (i.e., vnode references)
Raw Read/Write Stats
|-------------------------------------------|
| Same Network | Diff Network |
|----------|----------|----------|----------|
| Total | Auth | Total | Auth |
|----------|----------|----------|----------|
Reads | 232 | 36 | 1 | 0 |
Writes | 0 | 0 | 0 | 0 |
|-------------------------------------------|
Writes Affecting Authorship
|-------------------------------------------|
| File Authorship | Directory Authorship|
|----------|----------|----------|----------|
| Same | Diff | Same | Diff |
|----------|----------|----------|----------|
0-60 sec | 0 | 0 | 0 | 0 |
1-10 min | 0 | 0 | 0 | 0 |
10min-1hr | 0 | 0 | 0 | 0 |
1hr-1day | 0 | 0 | 0 | 0 |
1day-1wk | 0 | 0 | 0 | 0 |
> 1wk | 0 | 0 | 0 | 0 |
|-------------------------------------------|
RWrite: 1870461518 Backup: 1870461520
number of sites -> 1
server afstest.cns.ualberta.ca partition /vicepa RW Site -- New rel=
ease
--=20
GDB has a 'break' feature; why doesn't it have 'fix' too?