[OpenAFS-devel] Salvager (from openafs-server-1.6.5-1.el6.x86_64) segmentation
fault
Harald Barth
haba@kth.se
Tue, 13 Aug 2013 11:23:07 +0200 (CEST)
After our fileserver fell over, the salvager had to run and it fell over with:
Program terminated with signal 6, Aborted.
#0 0x00007f2b3fe328a5 in raise () from /lib64/libc.so.6
64
gdb on the core gives:
(gdb) where
#0 0x00007f2b3fe328a5 in raise () from /lib64/libc.so.6
#1 0x00007f2b3fe34085 in abort () from /lib64/libc.so.6
#2 0x0000000000424851 in osi_Panic (
msg=0x43ef88 "assertion failed: %s, file: %s, line: %d\n") at rx_user.c:251
#3 0x000000000042486e in osi_AssertFailU (
expr=0xed1a <Address 0xed1a out of bounds>,
file=0x6 <Address 0x6 out of bounds>, line=-1) at rx_user.c:261
#4 0x000000000040a29b in SalvageVolume (salvinfo=0x7fffd0c150b0,
rwIsp=<value optimized out>, alinkH=0x17125b0) at vol-salvage.c:3986
#5 0x000000000040cb2d in DoSalvageVolumeGroup (
salvinfo=<value optimized out>, isp=0x1710450, nVols=1)
at vol-salvage.c:2092
#6 0x000000000040db85 in SalvageFileSys1 (partP=<value optimized out>,
singleVolumeNumber=0) at vol-salvage.c:937
#7 0x000000000040e1c5 in SalvageFileSysParallel (partP=0x16ebbe0)
at vol-salvage.c:667
#8 0x000000000040ee2f in handleit (as=<value optimized out>,
arock=<value optimized out>) at ./salvager.c:375
#9 0x0000000000410687 in cmd_Dispatch (argc=7, argv=0x16e74b0) at cmd.c:905
#10 0x000000000040e9ce in main (argc=6, argv=0x7fffd0c15cc8)
at ./salvager.c:534
(gdb) up
#1 0x00007f2b3fe34085 in abort () from /lib64/libc.so.6
(gdb) up
#2 0x0000000000424851 in osi_Panic (
msg=0x43ef88 "assertion failed: %s, file: %s, line: %d\n") at rx_user.c:251
251 afs_abort();
(gdb) up
#3 0x000000000042486e in osi_AssertFailU (
expr=0xed1a <Address 0xed1a out of bounds>,
file=0x6 <Address 0x6 out of bounds>, line=-1) at rx_user.c:261
261 osi_Panic("assertion failed: %s, file: %s, line: %d\n", expr,
(gdb) up
#4 0x000000000040a29b in SalvageVolume (salvinfo=0x7fffd0c150b0,
rwIsp=<value optimized out>, alinkH=0x17125b0) at vol-salvage.c:3986
3986 osi_Assert(Delete(&dh, "..") == 0);
(gdb) list
3981 SetSalvageDirHandle(&dh, vid, salvinfo->fileSysDevice,
3982 salvinfo->vnodeInfo[class].inodes[v],
3983 &salvinfo->VolumeChanged);
3984 pa.Vnode = LFVnode;
3985 pa.Unique = LFUnique;
3986 osi_Assert(Delete(&dh, "..") == 0);
3987 osi_Assert(Create(&dh, "..", &pa) == 0);
3988
3989 /* The original parent's link count was decremented above.
3990 * Here we increment the new parent's link count.
(gdb)
I assume the salvager tries to delete the directory entry .. and create it again new.
Looks to me like FindItem() in dir.c:Delete() came up empty handed, we got ENOENT which
did Abort().
Do you think it's safe to change row 3986 to something less dramatic that Abort() or
do you have a better suggestion?
Harald.