[OpenAFS] largefile: a strange error in vos release

Peter Somogyi psomogyi@gamax.hu
Tue, 20 Dec 2005 17:15:22 +0100


When testing largefile support of 1.4.0, we have found a strange problem:
- created a 10GB vicepb partition of type ext3 (using lvm2)
- we had a volume dump file of 3 GB with one large file of 3 GB
- I issue 4 times "vos restore $HOST vicepb bigvol[1,2,3,4] /voldump/3gb_vol.dump", the last one fails - because of insufficient space (so far it's OK.)
- after then I remove them with "vos remove -id[1,2,3,4]" (last one fails, because that wrong one didn't get into vldb - so far it's perhaps OK.)
- after then I "vos restore $HOST vicepb bigvol1" again, and now it fails:
Restoring volume bigvol1 Id 536871014 on server stoneblade14v.mainz.de.ibm.com partition /vicepb ..Could not transmit data
Possible communication failure
Error in vos restore command.
Possible communication failure

In VolserLog:
Dec 20 16:46:58 stoneblade14 volserver[2406]: 1 Volser: CreateVolume: volume 536871014 (bigvol1) created
Dec 20 16:52:08 stoneblade14 volserver[2406]: 1 Volser: WriteFile: Error reading dump file 4 size=3072000000 nbytes=1297768448 (0 of 0); restore aborted
Dec 20 16:52:08 stoneblade14 volserver[2406]: 1 Volser: ReadVnodes: IDEC inode 12884901892

Reproducability: almost always.
I've reproduced the same with a partition of 7 GB (with less reproducability), but somehow couldn't (or have'nt tried enough times) do that on 5 GB.
Somehow I wasn't able to reproduce it with a 1GB dump file.
It seems trying a vos restore with insufficient space is necessary before reproducing the error.

The "nbytes=1297768448" value varies per bug reproduction.
Unfortunately, gdb shows meaningless values at logging the error.

Did anybody have the same bug?
(Is ext3 allowed with largefile server?...)