[OpenAFS] Re: Debugging a network performance problem that affects AFS

Andrew Deason adeason@sinenomine.net
Thu, 13 Jan 2011 15:42:45 -0600


On Thu, 13 Jan 2011 15:24:00 -0500
Dale Pontius <pontius@btv.ibm.com> wrote:

> I'm wondering if it's possible to collect access time statistics out
> of an OpenAFS Linux client.

"access time" is a bit vague to me; you just want to see how quickly it
is getting a response from the fileserver? There are numerous steps
involved in fetching data, and the cause of bad performance could be in
many places.

> A little time with google and I see the "-enable_peer_stats" and
> "-enable_process_stats" options when starting the client daemon, and
> this very well may furnish the information that I need.

You don't need to start the client with those options; see the 'fs
rxstatpeer' and 'fs rxstatproc' commands to turn the stats on and off.

However, your bigger problem is retrieving the statistics. I don't think
we offer much in the tree that's very useful; you can try
src/libadmin/samples/rxstat_get_peer and rxstat_get_process, but I don't
expect them to be very robust. Of course, I'm not sure if there are
other tools to retrieve the data floating around somewhere (or in
IBM...).

> A subsequent search gets me to the "rxdebug" document, though that
> document appears to be server-centric as opposed to querying the
> client.  Nor does it tell me what information I can collect or if
> access time is part of that information - only mentioning serveral
> parameters that it does collect.

rxdebug is useful for clients and servers. The 'rxdebug -rxstats'
statistics and other information are useful for debugging performance
problems, but won't tell you much about time taken to process RPCs. It's
more useful for just indicating if there's a problem with packets
getting lost or if there's some other problems interfering with packets
and such.

If you just want the RTT to the various fileservers, 'rxdebug -peers'
can tell you that. The RTT calculated by Rx isn't always accurate
(depending on the version in use and other factors), but it will tell
you what Rx thinks the RTT is.

Oh, and also, 'rxdebug' can be used as a simple test of fileserver
overloaded-ness. If you just run 'rxdebug <fileserver>', you'll see a
couple of lines that say

X calls waiting for a thread

and

Y calls have waited for a thread

Which is how many calls are currently not being serviced due to a lack
of available threads, and a running count of how many calls have waited,
respectively. You normally want them to be 0; the higher they are, the
slower the fileserver is going to be.

> Can someone toss me a bone here - or a link?

If you want something quick, you can look at the output of

$ xstat_cm_test <client> -collID 2 -onceonly

Which will give you a bunch of statistics for the client. Many of the
fields are briefly described here:
<http://docs.openafs.org/AdminGuide/apc.html#HDRWQ618>.

For RPC timings, for reading data you probably want to be looking at
FetchStatus, FetchData, and InlineBulkStatus.

-- 
Andrew Deason
adeason@sinenomine.net