[OpenAFS] fileserver meltdown diagnostics
Nathan Neulinger
nneul@umr.edu
Sun, 12 Dec 2004 10:29:02 -0600
What causes a thread to get sucked up continually. We are diagnosing
an issue with one of our fileservers that has a problem with at least one
client that is holding open:
Connection from host 131.151.99.183, port 7001, Cuid 858895d7/6dfc428
serial 64, natMTU 1260, security index 0, server conn
call 0: # 1, state active, mode: error
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
(about 10 of those)
Connection from host 131.151.99.183, port 7001, Cuid 9904b28f/6d935cc
serial 2815, natMTU 1260, security index 0, client conn
call 0: # 88, state active, mode: receiving, flags: reader_wait, has_output_packets
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
(and ONE of those...)
I believe there is the possibility of another client that is intermittently
causing the same problem, resulting in all remaining threads being taken,
and the server going into a meltdown state.
What would cause these connections/threads to not be reclaimed? i.e. once
they get into error state, why aren't they being freed?
I have a FULL network trace of all traffic from this particular client to
this server as it is happening, but not when it started unfortunately. I
will have that as soon as it melts down again though. (Not if, when.
I would expect it to be sometime in the next 2 hours. Maybe 10 minutes
ago the idle thread count was solid at 6, it's now solid at 5. I expect
that to count down till the server melts.)
-- Nathan
------------------------------------------------------------
Nathan Neulinger EMail: nneul@umr.edu
University of Missouri - Rolla Phone: (573) 341-6679
UMR Information Technology Fax: (573) 341-4216