[OpenAFS] Interrupted AFS backups -- how to recover

Mike Polek mike@pictage.com
Thu, 20 Oct 2005 10:10:47 -0700


Hi, All,
   The other day, while I had a backup running, our tape library
faulted hard. It couldn't change a tape. The backup was appending
to a tape, and the tape happened to fill... so it wrote part of the
volume and hit the end of the tape.

--==BEGIN
Wed Oct 19 04:52:36 2005: Task 3: Can't write VolumeData on tape
      butm: tape I/O error: Input/output error
Wed Oct 19 04:52:36 2005: Task 3: Warning: Dump (dumps.Wed) hit end-of-tape 
inferred
      butm: tape I/O error: Input/output error
Wed Oct 19 04:52:38 2005: Task 3: Volume dumps.pgdbs.backup (537990504) hit 
end-of-tape inferred - will retry on next tape
      butm: tape I/O error
Wed Oct 19 04:52:40 2005: Task 3: Warning: Can't close tape
      butm: error during tape close: Invalid argument
Wed Oct 19 04:55:57 2005: Task 3: Prompt for tape dumps.week3.2 (1129722420)
--==END

However... we ended up having to reboot the machine while
it was waiting for the tape. The end result is that
file 17 on the tape is the partial volume, which runs to the
end of the tape.

--==BEGIN
Dump
----
id = 1129722420
Initial id = 1129463374
Appended id = 0
parent = 1129636068
level = 3
flags = 0x4: In progress
volumeSet = dumps
dump path = /week3/Mon/Tue/Wed
name = dumps.Wed
created = Wed Oct 19 04:47:00 2005
nVolumes = 1
Group id  = 0
tapeServer =
format = dumps.week3.%d
maxTapes = 1
Start Tape Seq = 1
name = admin
instance =
cell =

Tape
----
name = DMPS3001
flags = 0x20: Successful
written = Wed Oct 19 04:50:51 2005
expires = Sun Nov  6 03:50:51 2005
kBytes Tape Used = 24781483
nMBytes Data = 179
nBytes  Data = 934485
nFiles = 0
nVolumes = 1
seq = 1
labelPos = 16
useCount = 3
dump = 1129722420

Volume
------
name = dumps.mydbs.backup
flags = 0x18: First fragment: Last fragment
id = 537990507
server =
partition = 0
tapeSeq = 1
position = 17
clone = Wed Oct 19 04:34:19 2005
startByte = 0
nBytes = 188629589
seq = 0
dump = 1129722420
tape = DMPS3001
--==END

Now the tricky part. When I to do the next backup, the
system complains that it can't position the tape correctly.

Wed Oct 19 10:51:21 2005: Task 1: Can't append: Can't position to end of 
dump on tape DMPS3001
      butm: unexpected tape datablock

OK... no problem... I'll just remove the entry from the backup database,
right?

--==BEGIN
# backup deletedump 1129722420 -dbonly
The following dumps were deleted:
backup: dump is not an initial dump ; Unable to delete dumpID 1129722420 
from database
--==END

Ok... so I can't just wipe out the last part of the dump?

So my question is... what are my options at this point? Is there
an easy way to recover? The things I can think of are:

1) delete the whole dump from the database and rescan the tapes
2) Wipe out file 17 and put a smaller file there, faking things
    out so the backup system thinks it can reposition the tape.
3) Write a utility to manually edit the backup database (dangerous)

Any other suggestions? It's not all that uncommon for a backup
to run into this kind of problem for one reason or another.
It seems that if things don't run perfectly, recovering from
an error is challenging at best. Am I missing something? Is
there already a utility for recovering from errors?

Thanks in advance...

-- 
Michael Polek
Manager of System Operations
Pictage, Inc.
1580 Francisco Street, Ste. 101
Torrance, CA 90501
(310) 525-1600 ext. 628
mike@pictage.com
Czar of all the Russias
--
Opinions are my own and do not necessarily reflect those
of the company. Viewer discretion is advised.
Please do not make any inferences about what is in this email
beyond what is stated. If there is any unclarity in this email,
please ask the author of the email for clarification. Any assumptions
about the content of this email or what it means are solely the
responsibility of the reader.
E Pluribus Unum. Annuit Coeptis. Novus Ordo Seclorum.

Confidentiality Notice:

This message, together with any attachments, is intended only for the use of
the individual or entity to which it is addressed. It may contain
information that is confidential and prohibited from disclosure. If you are
not the intended recipient, you are hereby notified that any dissemination
or copying of this message or any attachment is strictly prohibited. If you
have received this item in error, please notify the original sender and
destroy this item, along with any attachments.