[OpenAFS] error in restoring volumes from backup when the size of at least one volume exceeds 450-500 GB, solved

Giovanni Bracco giovanni.bracco@enea.it
Sun, 18 Jul 2021 20:33:18 +0200


After a private communication by Jeffrey Altman and Marc Dionne, 
suggesting that the observed problem was not an OpenAFS problem, but a 
problem of our script used to simplify (!?) the restore volumes from the 
backup, I checked the situation and indeed they were right:

It was just a stupid bug in the old script (written by myself ..by the 
way) used in restoring volumes from backup.

The rather random correlation with volume size was due to the fact that 
the error turned out when more the one "tape" file is used during the 
volume backup.

Sorry for the wrong posting to the mailing list!

Giovanni

On 06/07/21 15:16, Giovanni Bracco wrote:
> The backup of our AFS cell is performed using the standard OpenAFS 
> backup command and we know that it works nicely  if the maximum size of 
> volumes does not exceeds 450-500 GB.
> 
> If at least one volumes exceeds that size, the restore  of any volume is 
> unpredictable, for some it works, for others it doesn't.
> 
> When it does not work this is the message we obtain in the restore phase:
> 
> 
> ...
> backup volrestore -server cresco-fs2.portici.enea.it -partition a 
> -volume user.user0 -extension .20210627 1 -usedump 1624745427
> 
> backup: waiting for job termination
> Starting restore
> 
> Full restore being processed on port 0
> 
> 
> Restore
> Restoring volume user.user0.20210627
> 1
> Prompt for tape mega2106d.f.5 (1624616735)
> Thanks, now proceeding with tape reading operation.
> Proceeding with tape operation
> Restoring volume user.user0.20210627
> 1 Id 0 on server cresco-fs2.portici.enea.it partition /vicepa ..
> Could not create new volume 0
>     : Argument list too long
> : Argument list too long
> Can't read EOF on tape
>       butm: unexpected tape datablock
> Can't restore volume user.user0.20210627
> 1
>       : Argument list too long
> Restore: Skipping volume user.user0.backup (536870932)
> Restore: Finished
> Job 1: Full Restore finished
> -------
> 
> 
> where the essential part I think is:
> 
> Could not create new volume 0
>     : Argument list too long
> : Argument list too long
> Can't read EOF on tape
>       butm: unexpected tape datablock
> 
> 
> The error is now obtained on a brand new test AFS cell, with OpenAFS 
> 1.8.7 everywhere, both on DB servers and fileservers (CentOS 7.x)
> 
> The backup is performed on 100 GB disk files (we have tried also with 
> 200GB files but the result is the same error).
> 
> Any suggestion?
> 
> Giovanni
> 
> 

-- 
Giovanni Bracco
phone  +39 351 8804788
E-mail  giovanni.bracco@enea.it
WWW http://www.afs.enea.it/bracco