[OpenAFS] Re: Re: 'vos dump' destroys volumes?

Matthias Gerstner matthias.gerstner@esolutions.de
Tue, 27 Mar 2012 14:01:04 +0200


Hello Andrew,

thank you for your reply.

>> 'vos dump -server <server> -partition <partition> -clone -id <vol>'
>
> <vol> I presume is an rw volume?

Yes it is.

> Just so you know, a more common way of doing this is to use 'vos
> backupsys' and then backup the .backup volumes. Nothing 'wrong' with
> what you're doing, but it's a less common way.

I guess the result should technically be the same. It's just that I
don't want permanent backup versions of all volumes so the dump with
-clone is more convenient for me.

> What did volserver say in VolserLog when that happened? It should give
> a reason as to why it could not attach.

The previously attached log was all that was produced at the time of the
indicent. Sadly I lost the logs in the meantime due to recent trouble on
my servers. So I cannot take a second look if I missed something.

>> After running a salvage on the affected volume it was brought back
>> online but most of the contained data was deleted due to a supposed
>> corruption of the directory strucuture detected during salvage.
>
> SalvageLog will say specifically why. Or SalsrvLog if you are running
> DAFS; are you running DAFS?

No I'm not running DAFS.

The situation with the salvage was as follows: The affected volume
was a pretty large volume containing about 160 gigabytes of data spread
across 3.5 million files. During the salvage I saw a *lot* of log lines
similar to this flying by:

'??/??/SomeFile' deleted.

After half an hour of seeing this the volume was back online with less
than 10 gigabytes of data remaining. So I figured the top-level
directory structure got somehow lost. Sorry that I can't provide the
actual log any more.

> What was the volume id for the volume in question? Possibly 536879790
> or 536879793?

It was 536879793. It was the last volume for which a clone was created.
After the error the backup script stopped.

> 1.6.1 is not a version that exists yet (or at least, certainly did not
> exist on Friday). What version is the volserver, and what version is
> 'vos'? (Running `strings </path/to/bin> | grep built` is a sure way to
> tell.)

Seems I forgot to mention 'pre1':

# strings /usr/sbin/vos | grep built
@(#) OpenAFS 1.6.1pre1 built  2012-01-24

Is it too risky to use the pre-release? I got used to running the
unstable openafs packages for being able to keep up with recent Linux
kernel versions.

> Hmm, are you sure 'vos dump' is the only thing you are running at the
> time? (You're running more than one in parallel... how many do you run
> at once?) This sequence of operations does not seem normal for just a
> 'vos dump'.

Now that you say it, it really does look like two things are running in
parallel. But I can't think of how that could be happening. The backup
script is supposed to dump one volume after another in a serial manner.
And on this specific server the backup script is the only administrative
AFS operation that is scheduled at all. Also when I disable the backup
job for a night then nothing shows up in the log at all.

Is it maybe possible that asnychronous operations still run in the
volser from previous dump commands?

However, I'm running two pairs of file and volume server. Each machine
performs a backup of its volumes and this happens in parallel. But this
shouldn't affect a single machines log.

I'm getting continued weird behaviour during my backups. Last night for
example a dump was aborted with the following error message:

'consealed data inconsistent'

However the original volume in question remained intact this time. I'm
attaching the VolserLog of this incident.

This whole business gets me a bit worried. I'd like to backup volumes
without having to fear about the state of the source data.

Best regards,

Matthias

-- 
Matthias Gerstner, Dipl.-Wirtsch.-Inf. (FH), Senior Software Engineer
e.solutions GmbH

Am Wolfsmantel 46, 91058 Erlangen, Germany

Registered Office:
Pascalstr. 5, 85057 Ingolstadt, Germany

Phone +49-8458-3332-672, mailto:Matthias.Gerstner@esolutions.de
Fax +49-8458-3332-20672

e.solutions GmbH
Managing Directors Uwe Reder, Dr. Riclef Schmidt-Clausen
Register Court Ingolstadt HRB 5221