[OpenAFS] Re: 'vos dump' destroys volumes?

Matthias Gerstner matthias.gerstner@esolutions.de
Wed, 28 Mar 2012 19:01:26 +0200


--9l24NVCWtSuIVIod
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline

Hello,

> Please save the log if it happens again. Just a directory object being
> corrupt will not delete its children unless you pass '-orphans remove'
> to the salvager. However, the default, '-orphans ignore' will keep
> orphaned data around but it will be effectively invisible until you
> salvage with '-orphans attach'.

yes I understand this mechanism regarding orphaned files. I was using
the default. However, attaching the orphaned files wouldn't have helped
me much in this particular case. As I mentioned the volume contained
about 3.5 millions files and the directory structure was a crucial part
of the data. I would never have been able to reconstruct the original
data.

> That version is known to have issues with data corruption/loss, which
> are fixed in pre4. I don't know if that's what you're hitting, though.
> (You can also run a newer client with older servers just fine.)

So it seems I'm better off falling back to 1.6.0 on the server side
then.

> I assume the volserver is running the same version? As Kim said,
> 'rxdebug <server> 7005 -version'

Yes it's the same version. I'm running 1.6.1_pre1 for about two months
now. Systems have also been rebooted back then. So all versions are
consistent. Problems only arised recently, though.

> If you turn on the volser audit log with
> '-auditlog /usr/afs/logs/VolserLog.audit' or something

I'll consider increasing the log level.

For now I try to calm down the systems a bit. Today I was able to
perform a complete backup without errors. I'm will continue to monitor
it closely.

> So, you just have two completely separate servers, and each one is
> running a fileserver/volserver? Yeah, that shouldn't matter.

Exactly.

> That's "sealed data inconsistent". You can get this if your tokens
> expired sometime during the process (I don't remember / just don't know
> what causes that vs an 'expired' message). Do you have the output of
> 'vos' running with '-verbose' by any chance? How long are the backups
> taking, and are you running this under a tool to refresh tokens?

Yes it probably had to do with expiring authentication. Only it's
confusing that after that specific error I was able to dump another
volume. Only then I received a token expiration error.

The tokens expired due to an error I introduced in the backup script. My
approach is to renew authentication after each dump. Token lifetime is
eight hours. And I haven't got any volumes that should take longer than
that for dumping.

What would a tool look like that refreshes tokens during a long dump?
Something like a background process in the same authentication group?

> Hmm, did you forget to attach this?

Sorry. I'm adding it now. It's the log for the "sealed data
inconsistent" error. As you can see in the log the backup continued
another ten minutes before it finally failed due to "token expired"
failure. But it surely was related to the expired token, too.

Best regards,

Matthias

-- 
Matthias Gerstner, Dipl.-Wirtsch.-Inf. (FH), Senior Software Engineer
e.solutions GmbH

Am Wolfsmantel 46, 91058 Erlangen, Germany

Registered Office:
Pascalstr. 5, 85057 Ingolstadt, Germany

Phone +49-8458-3332-672, mailto:Matthias.Gerstner@esolutions.de
Fax +49-8458-3332-20672

e.solutions GmbH
Managing Directors Uwe Reder, Dr. Riclef Schmidt-Clausen
Register Court Ingolstadt HRB 5221

--9l24NVCWtSuIVIod
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: attachment; filename="VolserLog.sealed"

Tue Mar 27 03:07:49 2012 1 Volser: Clone: Cloning volume 536884635 to new volume 536889553
Tue Mar 27 03:13:00 2012 trans 112 on volume 536889553 is older than 300 seconds
Tue Mar 27 03:13:30 2012 trans 112 on volume 536889553 is older than 330 seconds
Tue Mar 27 03:14:00 2012 trans 112 on volume 536889553 is older than 360 seconds
Tue Mar 27 03:14:30 2012 trans 112 on volume 536889553 is older than 390 seconds
Tue Mar 27 03:15:00 2012 trans 112 on volume 536889553 is older than 420 seconds
Tue Mar 27 03:15:30 2012 trans 112 on volume 536889553 is older than 450 seconds
Tue Mar 27 03:16:01 2012 trans 112 on volume 536889553 is older than 480 seconds
Tue Mar 27 03:16:31 2012 trans 112 on volume 536889553 is older than 510 seconds
Tue Mar 27 03:17:01 2012 trans 112 on volume 536889553 is older than 540 seconds
Tue Mar 27 03:17:31 2012 trans 112 on volume 536889553 is older than 570 seconds
Tue Mar 27 03:18:01 2012 trans 112 on volume 536889553 is older than 600 seconds
Tue Mar 27 03:18:31 2012 trans 112 on volume 536889553 is older than 630 seconds
Tue Mar 27 03:19:01 2012 trans 112 on volume 536889553 is older than 660 seconds
Tue Mar 27 03:19:31 2012 trans 112 on volume 536889553 is older than 690 seconds
Tue Mar 27 03:20:01 2012 trans 112 on volume 536889553 is older than 720 seconds
Tue Mar 27 03:20:31 2012 trans 112 on volume 536889553 is older than 750 seconds
Tue Mar 27 03:21:01 2012 trans 112 on volume 536889553 is older than 780 seconds
Tue Mar 27 03:21:31 2012 trans 112 on volume 536889553 is older than 810 seconds
Tue Mar 27 03:22:01 2012 trans 112 on volume 536889553 is older than 840 seconds
Tue Mar 27 03:22:31 2012 trans 112 on volume 536889553 is older than 870 seconds
Tue Mar 27 03:23:01 2012 trans 112 on volume 536889553 is older than 900 seconds
Tue Mar 27 03:23:31 2012 trans 112 on volume 536889553 is older than 930 seconds
Tue Mar 27 03:24:01 2012 trans 112 on volume 536889553 is older than 960 seconds
Tue Mar 27 03:24:31 2012 trans 112 on volume 536889553 is older than 990 seconds
Tue Mar 27 03:25:01 2012 trans 112 on volume 536889553 is older than 1020 seconds
Tue Mar 27 03:25:31 2012 trans 112 on volume 536889553 is older than 1050 seconds
Tue Mar 27 03:26:01 2012 trans 112 on volume 536889553 is older than 1080 seconds
Tue Mar 27 03:26:31 2012 trans 112 on volume 536889553 is older than 1110 seconds
Tue Mar 27 03:27:01 2012 trans 112 on volume 536889553 is older than 1140 seconds
Tue Mar 27 03:27:31 2012 trans 112 on volume 536889553 is older than 1170 seconds
Tue Mar 27 03:28:01 2012 trans 112 on volume 536889553 is older than 1200 seconds
Tue Mar 27 03:28:31 2012 trans 112 on volume 536889553 is older than 1230 seconds
Tue Mar 27 03:29:01 2012 trans 112 on volume 536889553 is older than 1260 seconds
Tue Mar 27 03:29:31 2012 trans 112 on volume 536889553 is older than 1290 seconds
Tue Mar 27 03:30:01 2012 trans 112 on volume 536889553 is older than 1320 seconds
Tue Mar 27 03:30:31 2012 trans 112 on volume 536889553 is older than 1350 seconds
Tue Mar 27 03:31:01 2012 trans 112 on volume 536889553 is older than 1380 seconds
Tue Mar 27 03:31:31 2012 trans 112 on volume 536889553 is older than 1410 seconds
Tue Mar 27 03:32:01 2012 1 Volser: DumpVolume: Rx call failed during dump, error 19270410
Tue Mar 27 03:37:02 2012 trans 112 on volume 536889553 has been idle for more than 300 seconds
Tue Mar 27 03:37:32 2012 trans 112 on volume 536889553 has been idle for more than 330 seconds
Tue Mar 27 03:38:02 2012 trans 112 on volume 536889553 has been idle for more than 360 seconds
Tue Mar 27 03:38:32 2012 trans 112 on volume 536889553 has been idle for more than 390 seconds
Tue Mar 27 03:39:02 2012 trans 112 on volume 536889553 has been idle for more than 420 seconds
Tue Mar 27 03:39:32 2012 trans 112 on volume 536889553 has been idle for more than 450 seconds
Tue Mar 27 03:40:02 2012 trans 112 on volume 536889553 has been idle for more than 480 seconds
Tue Mar 27 03:40:32 2012 trans 112 on volume 536889553 has been idle for more than 510 seconds
Tue Mar 27 03:41:02 2012 trans 112 on volume 536889553 has been idle for more than 540 seconds
Tue Mar 27 03:41:32 2012 trans 112 on volume 536889553 has been idle for more than 570 seconds
Tue Mar 27 03:42:02 2012 trans 112 on volume 536889553 has been idle for more than 600 seconds
Tue Mar 27 03:42:02 2012 trans 112 on volume 536889553 has timed out
Tue Mar 27 03:42:02 2012 Deleting timed-out temporary volume 536889553

--9l24NVCWtSuIVIod--