[OpenAFS] Re: Performance problems seem to be coming back

Dale Pontius pontius@btv.ibm.com
Thu, 15 Sep 2011 08:05:07 -0400


On 09/12/2011 12:29 PM, Andrew Deason wrote:
> On Mon, 12 Sep 2011 11:31:56 -0400
> Dale Pontius<pontius@btv.ibm.com>  wrote:
>
>> Maybe I'm missing what rxdebug really does, but I think it sounds just
>> about perfect.  I presume that the OpenAFS clients and servers have
>> packet queues for moving data, and rxdebug just drops packets into
>> those queues like any other part of afs.  If the other parts of afs
>> queue are in "distress" for whatever reason, all of the other packets
>> in that queue will share in the distress, including my rxdebug
>> requests.  In this case, that's what I want - to be told about
>> "distress" between the server and me, and for the first approximation
>> I'm not too concerned about the reason, just that it exists.
> There are a large number of reasons an AFS fileserver will not properly
> service a request from a client; none of the "ping"s we've discussed
> will catch them all. What Jeffrey said will cover most of it, but if you
> want to test "will the fileserver give me data", then your "ping" should
> be asking the fileserver for some data through that client.
>
> If you want to test "will the fileserver give anybody data", you can use
> the afsio and afscp tools to do a one-shot read-the-file operation on a
> file you already know to exist.
>
> But if the existing rxdebug -version is detecting what you want, then
> fine, just use that. By making a more ping-like tool, I meant, having it
> report dropped packets, RTT, etc (right now it doesn't tell you at all
> what's going on). What it actually does on the wire is already fine for
> testing Rx stack availability.
>
I've been fooling around more with rxdebug, trying to measure 
performance.  It can be a bit tough at work, because when the LAN is 
running well, it runs very, very well.  (and when it's bad, it's 
horrid)  So much of the time the things I try to measure are in the 
mud.  Today for other reasons it was good to work from home, so 
fire-drills aside, it's "afs performance day".  My packets are going all 
over creation, between Comcast and my employer's WAN, so my numbers will 
always stay out of the mud.

For a first cut, I used "rxdebug localhost 7001 -peer" to get a list of 
servers, then "time rxdebug -v -servers ${serverName}" to measure some 
sort of round-trip time.  I know there will be extra stuff in that time, 
but in relative terms it ought to be indicative, at the very least.  My 
execution times were nearly always under 10mS, generally 5mS or less.

Then I kept reading man pages and suggestions here, and came up with 
"rxdebug -servers localhost -port 7001 -rxstats", which appeared to give 
round-trip times.  This sounded like a better option.  Of course since I 
first did this at work, all of the rtt's were zero.

Cut to today...  For one specific server, using the "time rxdebug" 
method I get:
Trying x.xx.xx.xxx (port 7000):
AFS version: Base configuration afs3.6 2.68

real    0m0.063s
user    0m0.000s
sys     0m0.001s
Using the "-rxstats" method on that same server I get:
Peer at host x.xx.xx.xxx, port 7000
     ifMTU 1444    natMTU 1444    maxMTU 1444
     packets sent 8152    packet resends 2
     bytes sent high 0 low 216058
     bytes received high 0 low 0
     rtt 0 msec, rtt_dev 0 msec
     timeout 3.000 sec
There's obviously a little bit of distress there I presume, because 2 
packets were resent.  Also, on my first run of this code, my timeout was 
2 sec, and here it has increased to 3 sec.  but the "rtt" and "rtt_dev" 
fields are still 0.

Is there some other flag I should be feeding rxdebug, or a different way 
I should be trying to make this measurment?  On the same note, the 
"afsio" and "afscp" appear to be missing from my installation.  When you 
talk about "one-shot read-the-file" do you mean that those commands 
bypass the cache?  Having the cache there seems to me to muddy 
file-based performance measurements, which was why I liked the idea of 
an rxdebug pseudo-ping.

Dale Pontius

-- 
Dale Pontius
Senior Engineer
IBM Corporation
Phone: (802) 769-6850
Tie-Line: 446-6850
email: pontius@us.ibm.com

This e-mail and its attachments, if any, may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply e-mail and delete all copies of this message from your system without copying it and notify sender of the misdirection by reply e-mail.