[OpenAFS] backup dump suddenly started failing - (failed - partially dumped, possible communication error)

Bastian dea1306@melvex.xs4all.nl
Fri, 08 Jan 2010 14:21:17 +0100


Hello,

AFS backup suddenly started to produce errors. Dumps of some volumes
(usuallly the larger ones) structurally fail.

A command like

/usr/sbin/backup dump commonvols /monthfull/weekinc/dayinc -localauth
-append

will produce output like this in /var/lib/openafs/TL_..._FILE:

Fri Jan  8 12:05:43 2010: Task 2: End of pass 1: Volumes remaining = 1
Fri Jan  8 12:05:43 2010: Task 2: Starting pass 2
Fri Jan  8 12:44:07 2010: Task 2: Volume <x> failed - partially dumped
     Possible communication failure
Fri Jan  8 12:44:07 2010: Task 2: Volume <x> omitted
Fri Jan  8 12:44:07 2010: Task 2: End of pass 2: Volumes remaining = 0
Fri Jan  8 12:44:09 2010: Task 2: commonvols.dayinc (DumpId 1262947221):
Finished. 1 volumes dumped, 1 failed, 2 unchanged

I don't know what this communication failure means in the context of
backup dump. The failed dump pass also takes way too long (see log). All
other AFS hosts (4) and processes work fine. I have no problems with any
volumes, partitions, databases or clock settings either.

more info:
- the backup dump command runs from a nightly script, only on
.backup-volumes that are untouched during that run
- backup dump has been running like this without any problems from march
 2008 to december 2009; then these errors appeared.
- I'm running version 1.4.7.dfsg1-6+lenny2 on Debian 5.0 (Lenny), kernel
2.6.26-19lenny2. This has not been changed since things started going wrong.


Any ideas/experiences regarding this problem?

Thanks in advance.

Bastian