[OpenAFS] Re: Investigating 'calls waiting' from rxdebug

Fri, 16 Aug 2013 11:15:25 -0500

On Thu, 15 Aug 2013 22:33:13 -0400
drosih@rpi.edu wrote:

> But my question is:  If this returns, how can I track down what is
> *causing* the calls-waiting value to climb?  We had over 100
> workstations using AFS at the time, scattered all around campus.  I
> did a variety of things to try and pinpoint the culprit, but didn't
> have much luck.

Dan's approach is good if you are just legitimately having too much AFS
activity and the fileserver can't keep up with it. Some alternative
approaches in the same area include looking at the audit log instead of
debug FileLog entries, examining wire dumps, or getting info out of
dtrace if you're on an applicable platform. Those all have slightly
different performance characteristics, but mostly different people use
different approaches depending on what's most convenient.

The fileserver does record some other stats as well, but they're not
broken down per client/peer, so they're not as useful. You can look
around for xstat_fs_test if you want some stats, anyway. I believe there
are facilities in the code for breaking this down per-peer (the rx
peer/process stats), but I don't think we have anything to extract the
data.

If those show very little, though, you probably have a thread actually
hanging on something else, so you won't see a lot of activity. (If you
show very little disk, net, and cpu usage at the same time, that seems
pretty likely.) In that case you'd need to look at a stack trace for the
fileserver process, or ideally capture a core to be examined later.

-- 
Andrew Deason
adeason@sinenomine.net