[OpenAFS] Dafileserver does not manage to slavage volume
Harald Barth
haba@kth.se
Wed, 08 Jul 2015 22:58:16 +0200 (CEST)
> Was there anything in the 'SalsrvLog' that indicates why the volume was
> not salvaged?
No, the last entry about that volume is
07/07/2015 06:08:17 dispatching child to salvage volume 536870963...
>> # bos salvage mount-kilimanjaro.stacken.kth.se a 536870963 -showlog
>> This is a demand attach fileserver. Are you sure you want to proceed with a manual salvage?
>> must specify -forceDAFS flag in order to proceed.
>
> Yes, that command will schedule a salvage of the volume with the salvage
> server.
If I only give the command without -forceDAFS nothing happens (as the text suggests).
> If you give -forceDAFS -all to do a complete salvage of all the partitions, or
> specify a -partition to salvage, the bosserver will shutdown the file server
> and salvage server, and run the salvager process, like pre-dafs. This means
> the file server is offline while the partitions are salvaged.
Ok, seems I need to run a short interrupt in the fileserver as I don't seem to
be able to get a salvage of the volume in some other way. Lets go....
That seems broken. After issuing the bos salvage command, I get a
salvageserver -client which does _nothing_...
# ps augxww | grep salvage
root 3507 0.0 0.0 41932 1264 ? Sl Jul07 0:01 /usr/lib/openafs/salvageserver -datelogs -parallel all8 -orphans attach
root 5720 0.0 0.0 41936 628 ? S Jul07 0:00 /usr/lib/openafs/salvageserver -datelogs -parallel all8 -orphans attach
root 9381 0.0 0.0 28152 3376 pts/0 S+ 22:38 0:00 bos salvage mount-kilimanjaro.stacken.kth.se a -volume 536870963 -showlog -local -forceDAFS
root 9382 0.0 0.0 9012 884 ? S 22:38 0:00 /usr/lib/openafs/salvageserver -client /vicepa 536870963
root 9458 0.0 0.0 8948 812 pts/1 S+ 22:41 0:00 grep salvage
# strace -p 9382
Process 9382 attached - interrupt to quit
restart_syscall(<... resuming interrupted call ...>) = 0
time(NULL) = 1436388191
write(3, "\3\0\0\0\223\0\0\0\223\0\0\0\0\0\0\0\246$\0\0\0\207\255g\4\0\1\0\0\0\1\0"..., 96) = 96
readv(3, [{"\3\0\0\0\203\3\0\0\223\0\0\0\203\3\0\0@\0\0\0\0\0\0\0\0\0\0\0\2\1\0\0", 32}, {"\2\0\0\0\213\0\0\0\0\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 32}], 2) = 64
rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
nanosleep({2, 0}, 0x7fff926ee230) = 0
time(NULL) = 1436388193
write(3, "\3\0\0\0\224\0\0\0\224\0\0\0\0\0\0\0\246$\0\0\0\207\255g\4\0\1\0\0\0\1\0"..., 96) = 96
And so on over and over.
Ok, kill everything off and restart....
07/08/2015 22:48:47 dispatching child to salvage volume 536870963...
07/08/2015 22:48:48 2 nVolumesInInodeFile 64
07/08/2015 22:48:48 waiting for fileserver to finish scanning partition /vicepa...
07/08/2015 22:48:54 CHECKING CLONED VOLUME 536870965.
07/08/2015 22:48:54 stackware.backup (536870965) updated 07/01/1998 18:19
07/08/2015 22:48:54 Vnode 1: length incorrect; (is 2048 should be 18432)
07/08/2015 22:48:54 SALVAGING VOLUME 536870963.
07/08/2015 22:48:54 stackware (536870963) updated 07/01/1998 18:19
07/08/2015 22:48:54 Vnode 1: length incorrect; changed from 2048 to 18432
07/08/2015 22:48:54 totalInodes 3408
07/08/2015 22:48:54 Salvaged stackware (536870963): 3401 files, 71692 blocks
07/08/2015 22:49:55 dispatching child to salvage volume 536870965...
07/08/2015 22:49:55 dispatching child to salvage volume 536870963...
07/08/2015 22:49:55 namei_ListAFSSubDirs: warning: VG 536870965 does not have a link table; salvager will recreate it.
07/08/2015 22:49:55 fileserver requested salvage of clone 536870965; scheduling salvage of volume group 536870963...
07/08/2015 22:49:55 1 nVolumesInInodeFile 32
07/08/2015 22:49:55 SALVAGING VOLUME 536870963.
07/08/2015 22:49:55 stackware (536870963) updated 07/01/1998 18:19
07/08/2015 22:49:55 totalInodes 3405
07/08/2015 22:49:56 Salvaged stackware (536870963): 3401 files, 71692 blocks
07/08/2015 22:49:56 The volume header file /vicepa/V0536870965.vol is not associated with any actual data (deleted)
So that time it went well. Wonder what blocked the salvage the first
and the second time. Looks to me that a salvageserver process can get
in some non-responsive state that then blocks all further salvage
attempts.
Harald.