[OpenAFS] OopenAFS 1.2.13: ever increasing number of fileserver connections
- h_Hold leak
Rainer Toebbicke
rtb@pclella.cern.ch
Wed, 05 Jan 2005 11:05:10 +0100
Something's wrong with OpenAFS 1.2.13 as on several busy servers we see
an ever-increasing number of host/client connections.
It looks like the number increases in batches, in one case on average
once per hour. That server had over 40000 clients after a few days of
running which dropped to a resonable 4000 once the server was restarted
and stable.
We're heavily batch-oriented which of course favours different contexts
which servers have to maintain even for clients (not hosts) which do not
exist any longer. For that reason we already reduced the client
grace-time from 2 hours to 30 minutes.
It does not help, though: after less than 1 hour the server has
accumulated plenty of HOSTDELETED hosts with associated deleted clients.
No clues in the FileLog. They do not get cleaned up because of a large
number of h_Holds on the host, which never disappear. hosts.dump
typicalls shows 'holds: 19fffff8000000000000 slot/bit: 0/268435456' for
a whole batch of hosts.
This does not happen on our 1.2.10 servers which should have a smilar
load (but of course you never know).
It looks like somewhere between 1.2.10 and 1.2.13 a leak on the "holds"
table opened (again).
Ah yes: the FileLog now frequently contains messages like
"FindClient: client b7f1b8(51d00484) already had conn ca6bf0 (host
808e42c0), stolen by client b7f1b8(51d00484)",
although I doubt that the underlying lookup problems contribute to the
h_Holds problem - because of the timing.
Any ideas?
--
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland
Phone: +41 22 767 8985 Fax: +41 22 767 7155