[OpenAFS] 1.4.2 fileserver keep getting large number of blocked connections

Hartmut Reuter reuter@rzg.mpg.de
Mon, 30 Jul 2007 12:40:33 +0200


What shows "rxdebug <name> -nodally"?

If there is a routing problem in the sense that messages can reach the 
server, but the responses cannot be sent to the client, you will see 
lots of connections in precall state. This happens because the threads 
are blocked until the full timeout of the replies are over. Then also 
RPCs from the same network have to wait until a thread becomes free.

Typically you will find messages in the FileLog as well of the form:


Mon Jul 30 10:23:48 2007 [srv_97] BreakDelayedCallbacks FAILED for host 
84.151.36.14:64457 which IS UP.  Connection from 84.151.36.14:64457. 
Possible network orrouting failure.

-Hartmut

Matthew Cocker wrote:
> Tonight we had 10 of our afs fielserver lockup. I had upgraded so to 
> 1.4.4 but they dies as well. All run on redhat 3 up6. Only one process 
> shows in ps listing and gcores on this process seem to give nothing. A 
> pstack dump is below. Is it any good. This is now a real disaster and 
> very weird. I have other fileserver that are setup identically which are 
> not dying. The only difference is that these are on a different subnet 
> and a different server room.
> 
> Thread 22 (Thread -1218524240 (LWP 27894)):
> #0  0x0044cc84 in sigwait () from /lib/tls/libpthread.so.0
> #1  0x08073a32 in ?? ()
> #2  0xb75ec9f0 in ?? ()
> #3  0xb75ec96c in ?? ()
> #4  0x00000000 in ?? ()
> Thread 21 (Thread -1229263952 (LWP 27895)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x080b3c0e in ?? ()
> #2  0x08646828 in ?? ()
> #3  0x086467dc in ?? ()
> #4  0xb6bae328 in ?? ()
> #5  0x080af859 in ?? ()
> #6  0x00000001 in ?? ()
> #7  0x00000000 in ?? ()
> Thread 20 (Thread -1240114256 (LWP 27896)):
> #0  0x0044959b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> #1  0x0808e5c0 in ?? ()
> #2  0x080f9540 in stderr ()
> #3  0x080f94c0 in stderr ()
> #4  0xb6155a58 in ?? ()
> #5  0x01cfe9b8 in ?? ()
> #6  0x00000000 in ?? ()
> Thread 19 (Thread -1254962256 (LWP 27897)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xb532c428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0xab1da358 in ?? ()
> #7  0x00000000 in ?? ()
> Thread 18 (Thread -1265587280 (LWP 27898)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xb490a428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0xab1c93a0 in ?? ()
> #7  0x00000000 in ?? ()
> Thread 17 (Thread -1276077136 (LWP 27899)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0x00000001 in ?? ()
> #5  0xb3f093d8 in ?? ()
> #6  0x00c650fd in malloc () from /lib/tls/libc.so.6
> #7  0x0805ea56 in ?? ()
> #8  0xb490bb5c in ?? ()
> #9  0x00000002 in ?? ()
> #10 0x00000001 in ?? ()
> #11 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
> #12 0x0805f182 in ?? ()
> #13 0xb490bb08 in ?? ()
> #14 0x00448b20 in pthread_mutex_unlock () from /lib/tls/libpthread.so.0
> #15 0x080604de in ?? ()
> #16 0x07ead882 in ?? ()
> #17 0x0000591b in ?? ()
> #18 0xb3f09474 in ?? ()
> #19 0x0000591b in ?? ()
> #20 0x0854a3f8 in ?? ()
> #21 0x591b5f70 in ?? ()
> #22 0x00000000 in ?? ()
> Thread 16 (Thread -1286566992 (LWP 27900)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x080a3a82 in ?? ()
> #2  0x087244cc in ?? ()
> #3  0x080f9758 in stderr ()
> #4  0xb3508a18 in ?? ()
> #5  0x080a3e85 in ?? ()
> #6  0x00000000 in ?? ()
> Thread 15 (Thread -1297056848 (LWP 27901)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xb2b07428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0xaabb1b90 in ?? ()
> #7  0x00000002 in ?? ()
> #8  0xb490bb08 in ?? ()
> #9  0xb490bb08 in ?? ()
> #10 0xb490bb60 in ?? ()
> #11 0x03ead882 in ?? ()
> #12 0xb2b073f8 in ?? ()
> #13 0x0805ea56 in ?? ()
> #14 0xb490bb5c in ?? ()
> #15 0x00000002 in ?? ()
> #16 0xaabb1b98 in ?? ()
> #17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
> Thread 14 (Thread -1307546704 (LWP 27902)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xb2106428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0x08e01618 in ?? ()
> #7  0x00000000 in ?? ()
> Thread 13 (Thread -1318036560 (LWP 27903)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xb1705428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0xaac696f8 in ?? ()
> #7  0x00000002 in ?? ()
> #8  0xb490bb08 in ?? ()
> #9  0xb490bb08 in ?? ()
> #10 0xb490bb60 in ?? ()
> #11 0x06ead882 in ?? ()
> #12 0xb17053f8 in ?? ()
> #13 0x0805ea56 in ?? ()
> #14 0xb490bb5c in ?? ()
> #15 0x00000002 in ?? ()
> #16 0xaac69700 in ?? ()
> #17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
> Thread 12 (Thread -1328526416 (LWP 27904)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xb0d04428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0x087335e8 in ?? ()
> #7  0x00000002 in ?? ()
> #8  0xb490bb08 in ?? ()
> #9  0xb490bb08 in ?? ()
> #10 0xb490bb60 in ?? ()
> #11 0x2bebd882 in ?? ()
> #12 0xb0d043f8 in ?? ()
> #13 0x0805ea56 in ?? ()
> #14 0xb490bb5c in ?? ()
> #15 0x00000002 in ?? ()
> #16 0x087335f0 in ?? ()
> #17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
> Thread 11 (Thread -1339016272 (LWP 27905)):
> #0  0x0044bf5e in recvmsg () from /lib/tls/libpthread.so.0
> #1  0x080b1a8f in ?? ()
> #2  0x00000005 in ?? ()
> #3  0xb03039d0 in ?? ()
> #4  0x00000000 in ?? ()
> Thread 10 (Thread -1349506128 (LWP 27906)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xaf902428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0xaab11f88 in ?? ()
> #7  0x00000000 in ?? ()
> Thread 9 (Thread -1359995984 (LWP 27907)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xaef01428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0xab1228a0 in ?? ()
> #7  0x00000000 in ?? ()
> Thread 8 (Thread -1370485840 (LWP 27908)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0x00000001 in ?? ()
> #5  0xae5003d8 in ?? ()
> #6  0x0000004a in ?? ()
> #7  0xaab00010 in ?? ()
> #8  0xb490bb08 in ?? ()
> #9  0xb490bb08 in ?? ()
> #10 0xb490bb60 in ?? ()
> #11 0x68ead882 in ?? ()
> #12 0xae5003f8 in ?? ()
> #13 0x0805ea56 in ?? ()
> #14 0xb490bb5c in ?? ()
> #15 0x00000002 in ?? ()
> #16 0xaabd5d30 in ?? ()
> #17 0x00449ed5 in pthread_getspecific () from /lib/tls/libpthread.so.0
> Thread 7 (Thread -1380975696 (LWP 27909)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x080a3a82 in ?? ()
> #2  0x0861c7f4 in ?? ()
> #3  0x080f9758 in stderr ()
> #4  0xadaffa18 in ?? ()
> #5  0x080a3e85 in ?? ()
> #6  0x00000000 in ?? ()
> Thread 6 (Thread -1391465552 (LWP 27910)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x0806fe71 in ?? ()
> #2  0xb490bba8 in ?? ()
> #3  0xb490bb60 in ?? ()
> #4  0xad0fe428 in ?? ()
> #5  0x08085a21 in ?? ()
> #6  0x08cd4e80 in ?? ()
> #7  0x00000000 in ?? ()
> Thread 5 (Thread -1401955408 (LWP 27911)):
> #0  0x00cca067 in ___newselect_nocancel () from /lib/tls/libc.so.6
> #1  0x0807d767 in ?? ()
> #2  0x00000071 in ?? ()
> #3  0x080f3280 in stderr ()
> #4  0x00000000 in ?? ()
> Thread 4 (Thread -1413178448 (LWP 27983)):
> #0  0x00c9daec in __nanosleep_nocancel () from /lib/tls/libc.so.6
> #1  0x00c9d90f in sleep () from /lib/tls/libc.so.6
> #2  0x0804afb2 in ?? ()
> #3  0x00000000 in ?? ()
> Thread 3 (Thread -1431307344 (LWP 27984)):
> #0  0x004493ad in pthread_cond_wait@@GLIBC_2.3.2 ()
> #1  0x080b3c0e in ?? ()
> #2  0x086f8768 in ?? ()
> #3  0x086f871c in ?? ()
> #4  0xaaaff918 in ?? ()
> #5  0x080b4df7 in ?? ()
> #6  0x086f871c in ?? ()
> #7  0xaaaff938 in ?? ()
> #8  0x00000004 in ?? ()
> #9  0x00000000 in ?? ()
> Thread 2 (Thread -1441797200 (LWP 27985)):
> #0  0x0044959b in pthread_cond_timedwait@@GLIBC_2.3.2 ()
> #1  0x0804b202 in ?? ()
> #2  0x080f46c0 in stderr ()
> #3  0x080f46f4 in stderr ()
> #4  0xaa0fea78 in ?? ()
> #5  0x0804b271 in ?? ()
> #6  0x00000000 in ?? ()
> Thread 1 (Thread -1218523008 (LWP 27891)):
> #0  0x00c9daec in __nanosleep_nocancel () from /lib/tls/libc.so.6
> #1  0x00c9d90f in sleep () from /lib/tls/libc.so.6
> #2  0x0804dda5 in ?? ()
> #3  0x00000000 in ?? ()
> 
> 
> 
> 
> 


-- 
-----------------------------------------------------------------
Hartmut Reuter                           e-mail reuter@rzg.mpg.de
					   phone +49-89-3299-1328
RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
Computing Center of the Max-Planck-Gesellschaft (MPG) and the
Institut fuer Plasmaphysik (IPP)
-----------------------------------------------------------------