[OpenAFS] OopenAFS 1.2.13: ever increasing number of fileserver connections - h_Hold leak

Rainer Toebbicke rtb@pclella.cern.ch
Wed, 05 Jan 2005 11:05:10 +0100


Something's wrong with OpenAFS 1.2.13 as on several busy servers we see 
an ever-increasing number of host/client connections.

It looks like the number increases in batches, in one case on average 
once per hour. That server had over 40000 clients after a few days of 
running which dropped to a resonable 4000 once the server was restarted 
and stable.

We're heavily batch-oriented which of course favours different contexts 
which servers have to maintain even for clients (not hosts) which do not 
exist any longer. For that reason we already reduced the client 
grace-time from 2 hours to 30 minutes.

It does not help, though: after less than 1 hour the server has 
accumulated plenty of HOSTDELETED hosts with associated deleted clients. 
No clues in the FileLog. They do not get cleaned up because of a large 
number of h_Holds on the host, which never disappear. hosts.dump 
typicalls shows 'holds: 19fffff8000000000000 slot/bit: 0/268435456' for 
a whole batch of hosts.

This does not happen on our 1.2.10 servers which should have a smilar 
load (but of course you never know).

It looks like somewhere between 1.2.10 and 1.2.13 a leak on the "holds" 
table opened (again).

Ah yes: the FileLog now frequently contains messages like
"FindClient: client b7f1b8(51d00484) already had conn ca6bf0 (host 
808e42c0), stolen by client b7f1b8(51d00484)",
although I doubt that the underlying lookup problems contribute to the 
h_Holds problem - because of the timing.

Any ideas?
-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985       Fax: +41 22 767 7155