[OpenAFS] 1.6.2 buserver + butc
Prasad Dharmasena
pkd@glue.umd.edu
Tue, 26 Mar 2013 20:04:15 -0400 (EDT)
Hello,
We recently upgraded our OpenAFS servers to 1.6.2, all running on
Solaris 10 (Generic_147440-27 sun4v sparc).
Since the buserver upgrade, backups have been failing for various
servers / various partitions.
Works: fileservers = 1.4.14.1, 1.6.1, 1.6.2
butc = (client side) 1.6.1
buserver = 1.4.14.1
Fails: fileservers = 1.4.14.1, 1.6.1, 1.6.2
butc = (client side) 1.6.1, 1.6.2
buserver = 1.6.2
For a partition (volset) that doesn't complete the 'backup dump',
/usr/afs/backup/TL_<port-offset> looks to be waiting for a DumpID
from the buserver.
---------------------
srv3:/usr/afs/backup:# cat TL_3106
Tue Mar 26 10:30:11 2013: Starting Tape Coordinator: Port offset 3106 Debug level 0
Tue Mar 26 10:30:11 2013: Token expires: Wed Dec 31 19:00:01 1969
Tue Mar 26 10:31:21 2013: Task 3106001: Dump TSM_srv3_f_135.04
---------------------
whereas for those butc/dump processes that proceed, the subsequent
lines have more info.
---------------------
srv3:/usr/afs/backup:# head TL_3115
Tue Mar 26 10:30:17 2013: Starting Tape Coordinator: Port offset 3115 Debug level 0
Tue Mar 26 10:30:17 2013: Token expires: Wed Dec 31 19:00:01 1969
Tue Mar 26 10:31:40 2013: Task 3115001: Dump TSM_srv3_o_157.26
Tue Mar 26 10:31:42 2013: Task 3115001: Dump TSM_srv3_o_157.26 (DumpID 1364308301)
Tue Mar 26 10:31:42 2013: Task 3115001: Starting pass 1
Tue Mar 26 10:31:42 2013: Task 3115001: Volume h.abcd.jchen114.backup (1971521033) not dumped - has not been modified since last dump
...
---------------------
The vicep* partitions (or volsets), for which the backup dump/butc
hang, are not consistent. If we kill and restart the dump process,
some of the previously hung volsets finish while others hang.
What info do we need to grab from butc and buserver in order to
track the problem?
Thanks.
-pkd
--
Prasad Dharmasena
University of Maryland, College Park