[OpenAFS] Chronic blocked connections on fileserver
Mon, 24 Sep 2007 08:23:46 -0500
We've been having very acute and chronic periods during which one of
our main fileservers shows large numbers of blocked connections.
These periods do not (it seems) correlate with high system load,
high network interface utilization, dropped packets, UDP errors,
high I/O or other badness indicators that I'm accustomed to looking
rxdebug shows up to 200-300 blocked connections during these
periods, which last up to an hour or so after which the badness
abates. Since this server hosts several critical volumes, including
one in which many $PATH elements live, users notice these
disruptions very quickly.
We've tried our best to balance accesses between our three main
servers and have moved several very active volumes off the
misbehaving server. After the move, the server handles ~1 million
volume accesses in an hour; our busiest server (which does not
experience this problem) handles nearly three times as many
accesses. rxdebug usually shows ~8 thousand active server and client
connections on this server.
No events in the FileLog correspond with the blocked connections. I
do see regular ProbeUuid failures, but those are benign (right?).
This server has a dual-core 3.00GHz Xeon CPU, 4GB RAM and a 1Gbps
network connection. Its vice partitions are stored on a
fibre-attached Xserve RAID array.
What other information would help resolve this problem? Is there
another aspect of the system that I should examine? What further
steps might we take to try to resolve the issue?