[OpenAFS] Re: fileserver meltdown diagnostics

Nathan Neulinger nneul@umr.edu
Sun, 12 Dec 2004 10:38:35 -0600


FYI, the traffice consists almost entirely of rx ack packets going
back and forth every ~4 seconds, with an occasional init callback state 
packet from the server, occasional gettime from the client, with an occasional
packet from server to client that can't be decoded. Ethereal is treating
as an encrypted CB request, but I don't know if that is correct.

-- Nathan

On Sun, Dec 12, 2004 at 10:29:02AM -0600, Nathan Neulinger wrote:
> What causes a thread to get sucked up continually. We are diagnosing
> an issue with one of our fileservers that has a problem with at least one
> client that is holding open:
> 
> Connection from host 131.151.99.183, port 7001, Cuid 858895d7/6dfc428
>   serial 64,  natMTU 1260, security index 0, server conn
>     call 0: # 1, state active, mode: error
>     call 1: # 0, state not initialized
>     call 2: # 0, state not initialized
>     call 3: # 0, state not initialized
> 
> (about 10 of those)
> 
> Connection from host 131.151.99.183, port 7001, Cuid 9904b28f/6d935cc
>   serial 2815,  natMTU 1260, security index 0, client conn
>     call 0: # 88, state active, mode: receiving, flags: reader_wait, has_output_packets
>     call 1: # 0, state not initialized
>     call 2: # 0, state not initialized
>     call 3: # 0, state not initialized
> 
> (and ONE of those...)
> 
> 
> 
> I believe there is the possibility of another client that is intermittently 
> causing the same problem, resulting in all remaining threads being taken,
> and the server going into a meltdown state.
> 
> What would cause these connections/threads to not be reclaimed? i.e. once
> they get into error state, why aren't they being freed?
> 
> I have a FULL network trace of all traffic from this particular client to
> this server as it is happening, but not when it started unfortunately. I 
> will have that as soon as it melts down again though. (Not if, when. 
> I would expect it to be sometime in the next 2 hours. Maybe 10 minutes
> ago the idle thread count was solid at 6, it's now solid at 5. I expect
> that to count down till the server melts.)
> 
> -- Nathan
> 
> ------------------------------------------------------------
> Nathan Neulinger                       EMail:  nneul@umr.edu
> University of Missouri - Rolla         Phone: (573) 341-6679
> UMR Information Technology             Fax: (573) 341-4216

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-6679
UMR Information Technology             Fax: (573) 341-4216