[OpenAFS] Re: backup dump suddenly started failing - (failed - partially dumped, possible communication error)

Bastian dea1306@melvex.xs4all.nl
Fri, 08 Jan 2010 17:10:12 +0100


Andrew Deason schreef:
> On Fri, 08 Jan 2010 14:21:17 +0100
> Bastian <dea1306@melvex.xs4all.nl> wrote:
> 
>> Fri Jan  8 12:05:43 2010: Task 2: End of pass 1: Volumes remaining = 1
>> Fri Jan  8 12:05:43 2010: Task 2: Starting pass 2
>> Fri Jan  8 12:44:07 2010: Task 2: Volume <x> failed - partially dumped
>>      Possible communication failure
> 
> Anything in VolserLog on the server that volume <x> is on?
> 
> Try 'vos dump -verbose'ing the backup volume <x>; what does it say?
> 


Thanks. You are right. It seems to hang. vos dump hangs after 982M of
the volume (which is about one fourth) has been dumped.

Then Volserlog shows:

Fri Jan  8 16:17:47 2010 trans 2 on volume 536871014 is older than 330
seconds
Fri Jan  8 16:18:17 2010 trans 2 on volume 536871014 is older than 360
seconds
<snip, more like this>
Fri Jan  8 16:26:49 2010 trans 2 on volume 536871014 is older than 870
seconds
Fri Jan  8 16:27:19 2010 1 Volser: DumpVolume: Rx call failed during
dump, error -01
Fri Jan  8 16:32:49 2010 trans 2 on volume 536871014 has been idle for
more than 330 seconds
Fri Jan  8 16:33:19 2010 trans 2 on volume 536871014 has been idle for
more than 360 seconds
<snip, more like this>
Fri Jan  8 16:37:21 2010 trans 2 on volume 536871014 has been idle for
more than 600 seconds
Fri Jan  8 16:37:21 2010 trans 2 on volume 536871014 has timed out


What could this mean? an unresponsive server? a corrupted volume?
All volumes with this problem are on this server. No problems during
everyday openafs usage though (even vos backup, vos shadow and vos
release work fine).

Bastian