[OpenAFS] Re: Performance problems seem to be coming back

Dale Pontius pontius@btv.ibm.com
Mon, 12 Sep 2011 12:59:35 -0400

On 09/12/2011 12:29 PM, Andrew Deason wrote:
> On Mon, 12 Sep 2011 11:31:56 -0400
> Dale Pontius<pontius@btv.ibm.com>  wrote:
>> Maybe I'm missing what rxdebug really does, but I think it sounds just
>> about perfect.  I presume that the OpenAFS clients and servers have
>> packet queues for moving data, and rxdebug just drops packets into
>> those queues like any other part of afs.  If the other parts of afs
>> queue are in "distress" for whatever reason, all of the other packets
>> in that queue will share in the distress, including my rxdebug
>> requests.  In this case, that's what I want - to be told about
>> "distress" between the server and me, and for the first approximation
>> I'm not too concerned about the reason, just that it exists.
> There are a large number of reasons an AFS fileserver will not properly
> service a request from a client; none of the "ping"s we've discussed
> will catch them all. What Jeffrey said will cover most of it, but if you
> want to test "will the fileserver give me data", then your "ping" should
> be asking the fileserver for some data through that client.
> If you want to test "will the fileserver give anybody data", you can use
> the afsio and afscp tools to do a one-shot read-the-file operation on a
> file you already know to exist.
> But if the existing rxdebug -version is detecting what you want, then
> fine, just use that. By making a more ping-like tool, I meant, having it
> report dropped packets, RTT, etc (right now it doesn't tell you at all
> what's going on). What it actually does on the wire is already fine for
> testing Rx stack availability.
So far "rxdebug -version" has been giving me better information than 
I've gotten before.  That may be construed as a good or bad statement, I 
guess, but it's an improvement.  A few posts up "rxdebug localhost -port 
7001 -peer" was suggested to give me at list of servers that I'm 
currently talking to.  I notice that command also gives at least some 
rtt information, as well.  Using the man page, I've also added 
"-rxstats" to it and get a little more information about fails and fatal 
errors.  On my last runs I'm not getting some of that information, but I 
suspect I'm also running out of local disk, at this point.

As I said, this looks better for diagnostic purposes than what I've had 
before.  I'll still take suggestions for improvement.  At the moment, 
I'm thinking of wrapping this in a loop with sleep, so I can start 
keeping some statistics.

Dale Pontius

Dale Pontius
Senior Engineer
IBM Corporation
Phone: (802) 769-6850
Tie-Line: 446-6850
email: pontius@us.ibm.com

This e-mail and its attachments, if any, may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply e-mail and delete all copies of this message from your system without copying it and notify sender of the misdirection by reply e-mail.