[OpenAFS-devel] 50 second fetch-data?

Sven Oehme oehmes@de.ibm.com
Fri, 7 Oct 2005 15:40:42 +0200


This is a multipart message in MIME format.
--=_alternative 004B1F69C1257093_=
Content-Type: text/plain; charset="US-ASCII"

hundreds of  :

Fri Oct  7 14:31:50 2005 FindClient: client 42f6bfb8(c3c1e5c) already had 
conn 42f439b8 (host 6f01a8c0), stolen by client 42f6bfb8(c3c1e5c)

Sven





Jeffrey Altman <jaltman@secure-endpoints.com> 
10/07/05 03:30 PM

To
Sven Oehme/Germany/IBM@IBMDE
cc
openafs-devel@openafs.org, psomogyi@gamax.hu
Subject
Re: [OpenAFS-devel] 50 second fetch-data?






Sven:

Please be specific.  What do you mean by "I see the same messages"? 

Jeffrey Altman

Sven Oehme wrote: 

Oh, 

just reading the mail from the beginning :-) 
may be we have two bugs here, my bug also reduce the performance , but i 
have no 50 sec delay, but i see the same messages (hundreds of them)  .
i start multiple batch jobs from 1 client (different processes) to 1 
server to 1 volume  .. 

rxdebug on the client if this helps somebody .. : 


testblade11:~ # rxdebug localhost 7001 -rxstats -noconns -long 
Trying 127.0.0.1 (port 7001): 
Free packets: 129, packet reclaims: 0, calls: 55, used FDs: 64 
not waiting for packets. 
0 calls waiting for a thread 
1 threads are idle 
rx stats: free packets 129, allocs 452504, alloc-failures(rcv 0/0,send 
575/0,ack 0) 
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, 
selects 0, sendSelects 0 
   packets read: data 8585 ack 124336 busy 0 abort 0 ackall 0 challenge 53 
response 0 debug 1420 params 0 unused 0 unused 0 unused 0 version 0 
   other read counters: data 8585, ack 124002, dup 0 spurious 333 dally 1 
   packets sent: data 114805 ack 8529 busy 0 abort 0 ackall 0 challenge 0 
response 53 debug 0 params 0 unused 0 unused 0 unused 0 version 0 
   other send counters: ack 8529, data 870762 (not resends), resends 0, 
pushed 0, acked&ignored 340943 
        (these should be small) sendFailed 0, fatalErrors 0 
   Average rtt is 0.001, with 26815 samples 
   Minimum rtt is 0.000, maximum is 0.095 
   1 server connections, 29 client connections, 2 peer structs, 47 call 
structs, 0 free call structs 


Sven



Sven Oehme/Germany/IBM@IBMDE 
Sent by: openafs-devel-admin@openafs.org 
10/07/05 03:04 PM 


To
Jeffrey Altman <jaltman@secure-endpoints.com> 
cc
Harald Barth <haba@pdc.kth.se>, openafs-devel@openafs.org, rees@umich.edu, 
psomogyi@gamax.hu 
Subject
Re: [OpenAFS-devel] 50 second fetch-data?









Hi Jeffrey, 

Peter and i work on that bug .. i have a test environment where i can 
reproduce the bug within 2 sec . 
if anybody like to assist us i can provide a tcpdump while it happens ..

Sven 


Jeffrey Altman <jaltman@secure-endpoints.com> 
Sent by: openafs-devel-admin@openafs.org 
10/07/05 02:22 PM 


To
Harald Barth <haba@pdc.kth.se> 
cc
rees@umich.edu, openafs-devel@openafs.org 
Subject
Re: [OpenAFS-devel] 50 second fetch-data?










Harald Barth wrote:

> You probably mean stuff like this:
> 
> Wed Oct  5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8) already 
had conn a7071568 (host 3fdded82), stolen by client 8320a78(6d5cb8f8)


> I have only ONE such log line and not for the time frame in question.
> 3fdded82 is my laptop 130.237.221.63 when at work. But I have no such
> message for any of its other IPs which would be *eded82 (130.237.237.*)
> - my laptop when at home.

This log message is not a symptom of the bug that was fixed related to
UUID collision.   This problem you are seeing may or may not be related
and it may or may not be an actual bug.

> I moved my H.haba.mail volume to another server which allows me to gdb
> and stop the fileserver without been lynched but of course the
> problems dissapeared when I did that. Probably I need to use up some
> kind of resource in the fileserver/rx first. I don't know how without
> letting loose real users. I know I have many connections from many
> clients. But a lot of free threads and no CPU or I/O load to speek of.
> Feel free to run rxdebug against houting.pdc.kth.se if you think you
> see something that I don't. Any tips how to collect statistics?
> 
> Harald.

I doubt moving your volume is going to help track down the problem.
You are not going to have lots of other users connecting to the new 
server.

I don't think we need to be able to stop the service.  However, it would
be useful to see what the server is doing in Ethereal.

Jeffrey Altman




--=_alternative 004B1F69C1257093_=
Content-Type: text/html; charset="US-ASCII"


<br><font size=2 face="sans-serif">hundreds of &nbsp;:</font>
<br>
<br><font size=2 face="sans-serif">Fri Oct &nbsp;7 14:31:50 2005 FindClient:
client 42f6bfb8(c3c1e5c) already had conn 42f439b8 (host 6f01a8c0), stolen
by client 42f6bfb8(c3c1e5c)<br>
</font>
<br><font size=2 face="Arial">Sven</font>
<br><font size=2 face="Arial"><br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>Jeffrey Altman &lt;jaltman@secure-endpoints.com&gt;</b>
</font>
<p><font size=1 face="sans-serif">10/07/05 03:30 PM</font>
<td width=59%>
<table width=100%>
<tr>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td valign=top><font size=1 face="sans-serif">Sven Oehme/Germany/IBM@IBMDE</font>
<tr>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td valign=top><font size=1 face="sans-serif">openafs-devel@openafs.org,
psomogyi@gamax.hu</font>
<tr>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td valign=top><font size=1 face="sans-serif">Re: [OpenAFS-devel] 50 second
fetch-data?</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><font size=3>Sven:<br>
<br>
Please be specific. &nbsp;What do you mean by &quot;I see the same messages&quot;?
&nbsp; <br>
<br>
Jeffrey Altman<br>
<br>
Sven Oehme wrote: </font>
<br><font size=3><br>
Oh, <br>
<br>
just reading the mail from the beginning :-) <br>
may be we have two bugs here, my bug also reduce the performance , but
i have no 50 sec delay, but i see the same messages (hundreds of them)
&nbsp;.<br>
i start multiple batch jobs from 1 client (different processes) to 1 server
to 1 volume &nbsp;.. <br>
<br>
rxdebug on the client if this helps somebody .. : <br>
<br>
<br>
testblade11:~ # rxdebug localhost 7001 -rxstats -noconns -long <br>
Trying 127.0.0.1 (port 7001): <br>
Free packets: 129, packet reclaims: 0, calls: 55, used FDs: 64 <br>
not waiting for packets. <br>
0 calls waiting for a thread <br>
1 threads are idle <br>
rx stats: free packets 129, allocs 452504, alloc-failures(rcv 0/0,send
575/0,ack 0) <br>
 &nbsp; greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers
0, selects 0, sendSelects 0 <br>
 &nbsp; packets read: data 8585 ack 124336 busy 0 abort 0 ackall 0 challenge
53 response 0 debug 1420 params 0 unused 0 unused 0 unused 0 version 0
<br>
 &nbsp; other read counters: data 8585, ack 124002, dup 0 spurious 333
dally 1 <br>
 &nbsp; packets sent: data 114805 ack 8529 busy 0 abort 0 ackall 0 challenge
0 response 53 debug 0 params 0 unused 0 unused 0 unused 0 version 0 <br>
 &nbsp; other send counters: ack 8529, data 870762 (not resends), resends
0, pushed 0, acked&amp;ignored 340943 <br>
 &nbsp; &nbsp; &nbsp; &nbsp;(these should be small) sendFailed 0, fatalErrors
0 <br>
 &nbsp; Average rtt is 0.001, with 26815 samples <br>
 &nbsp; Minimum rtt is 0.000, maximum is 0.095 <br>
 &nbsp; 1 server connections, 29 client connections, 2 peer structs, 47
call structs, 0 free call structs <br>
<br>
<br>
Sven<br>
<br>
<br>
</font>
<table width=100%>
<tr valign=top>
<td width=28%><font size=3><b>Sven Oehme/Germany/IBM@IBMDE</b> <br>
Sent by: </font><a href="mailto:openafs-devel-admin@openafs.org"><font size=3 color=blue><u>openafs-devel-admin@openafs.org</u></font></a><font size=3>
</font>
<p><font size=3>10/07/05 03:04 PM </font>
<td width=71%>
<br>
<table width=100%>
<tr>
<td width=17%>
<div align=right><font size=3>To</font></div>
<td width=82% valign=top><font size=3>Jeffrey Altman </font><a href="mailto:jaltman@secure-endpoints.com"><font size=3 color=blue><u>&lt;jaltman@secure-endpoints.com&gt;</u></font></a><font size=3>
</font>
<tr>
<td>
<div align=right><font size=3>cc</font></div>
<td valign=top><font size=3>Harald Barth </font><a href=mailto:haba@pdc.kth.se><font size=3 color=blue><u>&lt;haba@pdc.kth.se&gt;</u></font></a><font size=3>,
</font><a href="mailto:openafs-devel@openafs.org"><font size=3 color=blue><u>openafs-devel@openafs.org</u></font></a><font size=3>,
</font><a href=mailto:rees@umich.edu><font size=3 color=blue><u>rees@umich.edu</u></font></a><font size=3>,
</font><a href=mailto:psomogyi@gamax.hu><font size=3 color=blue><u>psomogyi@gamax.hu</u></font></a><font size=3>
</font>
<tr>
<td>
<div align=right><font size=3>Subject</font></div>
<td valign=top><font size=3>Re: [OpenAFS-devel] 50 second fetch-data?</font></table>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=49%>
<td width=50%></table>
<br></table>
<br><font size=3><br>
<br>
<br>
<br>
Hi Jeffrey, <br>
<br>
Peter and i work on that bug .. i have a test environment where i can reproduce
the bug within 2 sec . <br>
if anybody like to assist us i can provide a tcpdump while it happens ..<br>
<br>
Sven <br>
<br>
</font>
<table width=100%>
<tr valign=top>
<td width=49%><font size=3><b>Jeffrey Altman </b></font><a href="mailto:jaltman@secure-endpoints.com"><font size=3 color=blue><b><u>&lt;jaltman@secure-endpoints.com&gt;</u></b></font></a><font size=3>
<br>
Sent by: </font><a href="mailto:openafs-devel-admin@openafs.org"><font size=3 color=blue><u>openafs-devel-admin@openafs.org</u></font></a><font size=3>
</font>
<p><font size=3>10/07/05 02:22 PM </font>
<td width=50%>
<br>
<table width=100%>
<tr>
<td width=17%>
<div align=right><font size=3>To</font></div>
<td width=82% valign=top><font size=3>Harald Barth </font><a href=mailto:haba@pdc.kth.se><font size=3 color=blue><u>&lt;haba@pdc.kth.se&gt;</u></font></a><font size=3>
</font>
<tr>
<td>
<div align=right><font size=3>cc</font></div>
<td valign=top><a href=mailto:rees@umich.edu><font size=3 color=blue><u>rees@umich.edu</u></font></a><font size=3>,
</font><a href="mailto:openafs-devel@openafs.org"><font size=3 color=blue><u>openafs-devel@openafs.org</u></font></a><font size=3>
</font>
<tr>
<td>
<div align=right><font size=3>Subject</font></div>
<td valign=top><font size=3>Re: [OpenAFS-devel] 50 second fetch-data?</font></table>
<br><font size=3><br>
</font>
<br>
<table width=100%>
<tr valign=top>
<td width=49%>
<td width=50%></table>
<br></table>
<br><font size=3><br>
<br>
<br>
<br>
Harald Barth wrote:<br>
<br>
&gt; You probably mean stuff like this:<br>
&gt; <br>
&gt; Wed Oct &nbsp;5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8)
already had conn a7071568 (host 3fdded82), stolen by client 8320a78(6d5cb8f8)<br>
<br>
<br>
&gt; I have only ONE such log line and not for the time frame in question.<br>
&gt; 3fdded82 is my laptop 130.237.221.63 when at work. But I have no such<br>
&gt; message for any of its other IPs which would be *eded82 (130.237.237.*)<br>
&gt; - my laptop when at home.<br>
<br>
This log message is not a symptom of the bug that was fixed related to<br>
UUID collision. &nbsp; This problem you are seeing may or may not be related<br>
and it may or may not be an actual bug.<br>
<br>
&gt; I moved my H.haba.mail volume to another server which allows me to
gdb<br>
&gt; and stop the fileserver without been lynched but of course the<br>
&gt; problems dissapeared when I did that. Probably I need to use up some<br>
&gt; kind of resource in the fileserver/rx first. I don't know how without<br>
&gt; letting loose real users. I know I have many connections from many<br>
&gt; clients. But a lot of free threads and no CPU or I/O load to speek
of.<br>
&gt; Feel free to run rxdebug against houting.pdc.kth.se if you think you<br>
&gt; see something that I don't. Any tips how to collect statistics?<br>
&gt; <br>
&gt; Harald.<br>
<br>
I doubt moving your volume is going to help track down the problem.<br>
You are not going to have lots of other users connecting to the new server.<br>
<br>
I don't think we need to be able to stop the service. &nbsp;However, it
would<br>
be useful to see what the server is doing in Ethereal.<br>
<br>
Jeffrey Altman<br>
<br>
<br>
</font>
<br>
--=_alternative 004B1F69C1257093_=--