[OpenAFS] fileserver crashes
Marcus Watts
mdw@umich.edu
Wed, 13 Oct 2004 03:43:38 -0400
I don't know if our fileserver crashes are related to what Matthew Cocker
and others are seeing, but we are indeed seeing problems here at umich.
Background information:
The machines in question are dual pentium 4 machines with
hyperthreading enabled running linux 2.4.26 (SMP) and glibc 2.3.2. The
actual file storage is on "cheap" raid devices that use multiple IDE
drives but talk SCSI to the rest of the world. These raids have their
own set of problems, so I would not count them as super-reliable file
storage. We're running the "pthreads" version of the fileserver.
I think we're seeing at least 3 distinct problems with openafs 1.2.11.
The first may actually be networking. We get these with varying
frequency in VolserLog:
Sun Oct 10 22:05:09 2004 1 Volser: DumpVolume: Rx call failed during dump, error -1
Tue Oct 12 11:38:07 2004 1 Volser: DumpVolume: Rx call failed during dump, error -1
Tue Oct 12 13:39:23 2004 1 Volser: DumpVolume: Rx call failed during dump, error -1
Tue Oct 12 15:06:46 2004 1 Volser: DumpVolume: Rx call failed during dump, error -1
Helpful message, eh? Can't tell what volume was being dumped,
or where it was going.
Most of these probably occur while backing things up to TSM.
If I understand it right, we run buta there and there are
apparently issues with the amount of CPU it eats. I've
wondered what % of our backups are failing, but I haven't
heard of any negative consequences here, so far.
We have at various times gotten problems with read-only replicas that
are oddly truncated. This might or might not be the consequence of the
previous problem.
Another probably completely different problem we have concerns volumes
with really small volume IDs. Modern AFS software creates large 10
digit volume IDs. But we have volumes that were created long before
AFS 3.1, with small 3 digit volume IDs. Those volumes are rapidly
disappearing as one by one, during various restarts, the fileserver and
salvager proceed to discard all the data, then the volume header.
The latest problem is of course "signal 6". That's generally from an
assertion error, which probably writes a nice message out to stderr
(probably out to /dev/null), and might leave an equally useful core
dump, except linux defaults to no core dumps. Oops. We've now managed
to harvest one core dump, so here's interesting data from it:
[ in the below, I've obscured the following for privacy reasons:
@IPADDR@ is a hexadecimal umich IP address belonging to
some random machine about which we know little (though
a transient portable or lab machine running windows or
MacOS is quite possible.)
@VICEID@ is a vice ID of some random and likely innocent person
who might logically be using @IPADDR@.
]
(gdb) where
#0 0x400a6281 in __kill () at __kill:-1
#1 0x40021811 in pthread_kill (thread=17978, signo=0) at signals.c:65
#2 0x40021b1b in __pthread_raise (sig=1073904656) at signals.c:187
#3 0x400a5ec4 in *__GI_raise (sig=17978)
at ../linuxthreads/sysdeps/unix/sysv/linux/raise.c:34
#4 0x400a75ed in *__GI_abort () at ../sysdeps/generic/abort.c:117
#5 0x08096603 in osi_Panic (msg=0x0, a1=0, a2=0, a3=0) at ../rx/rx_user.c:199
#6 0x08096637 in osi_AssertFailU (expr=0x0, file=0x0, line=0) at ../rx/rx_user.c:208
#7 0x080a47fc in rx_SetSpecific (conn=0x4002a510, key=1, ptr=0x0) at ../rx/rx.c:6632
#8 0x0805defc in h_TossStuff_r (host=@IPADDR@) at ../viced/host.c:765
#9 0x0805cb6d in h_Release_r (host=@IPADDR@) at ../viced/host.c:280
#10 0x0805ca71 in h_Release (host=0x0) at ../viced/host.c:258
#11 0x0805e14e in h_Enumerate (proc=0x8060fa0 <CheckHost>, param=0x0)
at ../viced/host.c:913
#12 0x080613cd in h_CheckHosts () at ../viced/host.c:2080
#13 0x0804b6c7 in HostCheckLWP () at ../viced/viced.c:731
#14 0x4001ed03 in pthread_start_thread (arg=0xb31ffbe0) at manager.c:300
In case that's not completely obvious,
0x80a47e8 <rx_SetSpecific+312>: movl $0x80c8a00,0x4(%esp,1)
0x80a47f0 <rx_SetSpecific+320>: movl $0x80c7480,(%esp,1)
0x80a47f7 <rx_SetSpecific+327>: call 0x8096610 <osi_AssertFailU>
(gdb) x/8x $ebp
0xb31ffa14: 0xb31ffa44 0x080a47fc 0x080c7480 0x080c8a00
0xb31ffa24: 0x000019e5 0x400f0ce6 0x40a00010 0x08f3d84c
(gdb) x/i 0x080a47fc
0x80a47fc <rx_SetSpecific+332>: jmp 0x80a46db <rx_SetSpecific+43>
(gdb) x/s 0x080c7480
0x80c7480 <rcsid+3972>: "pthread_mutex_lock(&conn->conn_data_lock) == 0"
(gdb) x/s 0x080c8a00
0x80c8a00 <rcsid+80>: "../rx/rx.c"
(gdb) print/d 0x000019e5
$8 = 6629
Basically, it's dying at line 6629 in rx/rx.d, which reads thusly:
MUTEX_ENTER(&conn->conn_data_lock);
The return code from pthread_mutex_lock would be useful, but
the assertion logic in openafs doesn't save that.
Here's that data structure:
(gdb) print *conn
$7 = {next = 0x0, peer = 0x0, conn_call_lock = {__m_reserved = -1289749536,
__m_count = -1291841536, __m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0,
__spinlock = 0}}, conn_call_cv = {__c_lock = {__status = 0, __spinlock = 0},
__c_waiting = 0x0}, conn_data_lock = {__m_reserved = 0, __m_count = 0,
__m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}},
epoch = 0, cid = 0, error = 0, call = {0x0, 0x0, 0x0, 0x0}, callNumber = {0, 0, 0, 0},
serial = 0, lastSerial = 0, maxSerial = 0, challengeEvent = 0x0,
delayedAbortEvent = 0x0, checkReachEvent = 0x0, abortCount = 0, service = 0x0,
serviceId = 0, refCount = 0, flags = 0 '\0', type = 0 '\0', secondsUntilPing = 0 '\0',
securityIndex = 0 '\0', securityObject = 0x0, securityData = 0x0,
securityHeaderSize = 0, securityMaxTrailerSize = 0, timeout = 0, lastSendTime = 0,
secondsUntilDead = 0, hardDeadTime = 0, ackRate = 0 '\0', makeCallWaiters = 0 '\0',
nSpecific = 0, specific = 0x0}
(gdb)
Unfortunately, that's not really the data structure in question.
The next up stack frame looks like this:
(gdb) x/16x $ebp
0xb31ffa44: 0xb31ffa74 0x0805defc 0x08f3d820 0x00000001
0xb31ffa54: 0x00000000 0x40022050 0x00000000 0x40027c10
0xb31ffa64: 0xb31ffa74 0x00000003 0x00000008 0x08132ba0
0xb31ffa74: 0xb31ffa94 0x0805cb6d 0x08132ba0 0x00000000
First 3 parameters here are the stack frame back link, the return address,
and the first parameter, 'conn'. Gdb thinks that's
(gdb) print/x conn
$9 = 0x4002a510
So, yet again, source level debuggers lose. (Admittedly, juggling -g
and -O is a hard problem.) This is probably the real data:
(gdb) print *((struct rx_connection *) 0x08f3d820)
$10 = {next = 0x0, peer = 0x0, conn_call_lock = {__m_reserved = 0, __m_count = 0,
__m_owner = 0x0, __m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}},
conn_call_cv = {__c_lock = {__status = 0, __spinlock = 105},
__c_waiting = 0xffffb923}, conn_data_lock = {__m_reserved = -15030,
__m_count = -14533, __m_owner = 0xffffc74e, __m_kind = -13409, __m_lock = {
__status = -9283, __spinlock = -9021}}, epoch = 4294958286, cid = 4294960122,
error = -7173, call = {0xffffe70b, 0xffffe75f, 0xfffff36f, 0xfffff371}, callNumber = {
4294964109, 4294964111, 4294965251, 4294965253}, serial = 4294966684,
lastSerial = 4294966684, maxSerial = -612, challengeEvent = 0xffffff9a,
delayedAbortEvent = 0xffffff9b, checkReachEvent = 0x22e35, abortCount = 1145384520,
service = 0x39, serviceId = 45936, refCount = 2308, flags = 16 '\020', type = 116 't',
secondsUntilPing = 158 '\236', securityIndex = 8 '\b', securityObject = 0x0,
securityData = 0x0, securityHeaderSize = 0, securityMaxTrailerSize = 0, timeout = 0,
lastSendTime = 0, secondsUntilDead = 0, hardDeadTime = 0, ackRate = 0 '\0',
makeCallWaiters = 0 '\0', nSpecific = 0, specific = 0x0}
(gdb)
Hmm. Looks kinda junky too. But that's apparently what's really
there. The caller of
is h_TossStuff_r, and here's the client structure it was probably working on:
(gdb) print *host->FirstClient->next->next->next
$16 = {next = 0x4242d5e4, host = @IPADDR@, sid = -562995324, tcon = 0x8f3d820, CPS = {
prlist_len = 7, prlist_val = 0x0}, ViceId = @VICEID@, expTime = 1097781124,
LastCall = 1097601371, VenusEpoch = 2334476595, refCount = 0, deleted = 1 '\001',
authClass = 2 '\002', prfail = 0 '\0', InSameNetwork = 0 '\0', lock = {
wait_states = 0 '\0', excl_locked = 0 '\0', readers_reading = 0 '\0',
num_waiting = 0 '\0', mutex = {__m_reserved = 0, __m_count = 0, __m_owner = 0x0,
__m_kind = 0, __m_lock = {__status = 0, __spinlock = 0}}, read_cv = {__c_lock = {
__status = 0, __spinlock = 0}, __c_waiting = 0x0}, write_cv = {__c_lock = {
__status = 0, __spinlock = 0}, __c_waiting = 0x0}}}
There's the same connection structure, here called "tcon",
and it looks like the logic should be calling rx_SetSpecific() on that.
I don't see anything further in that client chain that has a non-zero tcon.
That doesn't mean there wasn't one; I kinda lost interest in looking for it
after I got to pasting this:
print *host->FirstClient->next->next->next->next->next->next->next->ne
xt->next->next->next->next->next->next->next->next->next->next->next->
next->next->next->next->next->next->next->next->next->next->next->next
->next->next->next->next->next->next->next->next->next->next->next->ne
xt->next->next->next->next->next->next->next->next->next->next->next
(on one line not wrapped like this)
All but the first 3 were marked "deleted", and nearly all but 2 near
the start were to the same person. 2 near the start were for 32766,
"nobody".
Anyways, I expect we'll get another core dump in a day or so.
-Marcus Watts
UM ITCS Umich Systems Group