[OpenAFS-devel] 50 second fetch-data?

Sven Oehme oehmes@de.ibm.com
Fri, 7 Oct 2005 15:04:19 +0200


This is a multipart message in MIME format.
--=_alternative 0047CB26C1257093_=
Content-Type: text/plain; charset="US-ASCII"

Hi Jeffrey, 

Peter and i work on that bug .. i have a test environment where i can 
reproduce the bug within 2 sec .
if anybody like to assist us i can provide a tcpdump while it happens ..

Sven 




Jeffrey Altman <jaltman@secure-endpoints.com> 
Sent by: openafs-devel-admin@openafs.org
10/07/05 02:22 PM

To
Harald Barth <haba@pdc.kth.se>
cc
rees@umich.edu, openafs-devel@openafs.org
Subject
Re: [OpenAFS-devel] 50 second fetch-data?






Harald Barth wrote:

> You probably mean stuff like this:
> 
> Wed Oct  5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8) already 
had conn a7071568 (host 3fdded82), stolen by client 8320a78(6d5cb8f8)


> I have only ONE such log line and not for the time frame in question.
> 3fdded82 is my laptop 130.237.221.63 when at work. But I have no such
> message for any of its other IPs which would be *eded82 (130.237.237.*)
> - my laptop when at home.

This log message is not a symptom of the bug that was fixed related to
UUID collision.   This problem you are seeing may or may not be related
and it may or may not be an actual bug.

> I moved my H.haba.mail volume to another server which allows me to gdb
> and stop the fileserver without been lynched but of course the
> problems dissapeared when I did that. Probably I need to use up some
> kind of resource in the fileserver/rx first. I don't know how without
> letting loose real users. I know I have many connections from many
> clients. But a lot of free threads and no CPU or I/O load to speek of.
> Feel free to run rxdebug against houting.pdc.kth.se if you think you
> see something that I don't. Any tips how to collect statistics?
> 
> Harald.

I doubt moving your volume is going to help track down the problem.
You are not going to have lots of other users connecting to the new 
server.

I don't think we need to be able to stop the service.  However, it would
be useful to see what the server is doing in Ethereal.

Jeffrey Altman




--=_alternative 0047CB26C1257093_=
Content-Type: text/html; charset="US-ASCII"


<br><font size=2 face="sans-serif">Hi Jeffrey, </font>
<br>
<br><font size=2 face="sans-serif">Peter and i work on that bug .. i have
a test environment where i can reproduce the bug within 2 sec .</font>
<br><font size=2 face="sans-serif">if anybody like to assist us i can provide
a tcpdump while it happens ..<br>
</font><font size=2 face="Arial"><br>
Sven <br>
</font>
<br>
<br>
<br>
<table width=100%>
<tr valign=top>
<td width=40%><font size=1 face="sans-serif"><b>Jeffrey Altman &lt;jaltman@secure-endpoints.com&gt;</b>
</font>
<br><font size=1 face="sans-serif">Sent by: openafs-devel-admin@openafs.org</font>
<p><font size=1 face="sans-serif">10/07/05 02:22 PM</font>
<td width=59%>
<table width=100%>
<tr>
<td>
<div align=right><font size=1 face="sans-serif">To</font></div>
<td valign=top><font size=1 face="sans-serif">Harald Barth &lt;haba@pdc.kth.se&gt;</font>
<tr>
<td>
<div align=right><font size=1 face="sans-serif">cc</font></div>
<td valign=top><font size=1 face="sans-serif">rees@umich.edu, openafs-devel@openafs.org</font>
<tr>
<td>
<div align=right><font size=1 face="sans-serif">Subject</font></div>
<td valign=top><font size=1 face="sans-serif">Re: [OpenAFS-devel] 50 second
fetch-data?</font></table>
<br>
<table>
<tr valign=top>
<td>
<td></table>
<br></table>
<br>
<br>
<br><font size=2><tt>Harald Barth wrote:<br>
<br>
&gt; You probably mean stuff like this:<br>
&gt; <br>
&gt; Wed Oct &nbsp;5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8)
already had conn a7071568 (host 3fdded82), stolen by client 8320a78(6d5cb8f8)<br>
<br>
<br>
&gt; I have only ONE such log line and not for the time frame in question.<br>
&gt; 3fdded82 is my laptop 130.237.221.63 when at work. But I have no such<br>
&gt; message for any of its other IPs which would be *eded82 (130.237.237.*)<br>
&gt; - my laptop when at home.<br>
<br>
This log message is not a symptom of the bug that was fixed related to<br>
UUID collision. &nbsp; This problem you are seeing may or may not be related<br>
and it may or may not be an actual bug.<br>
<br>
&gt; I moved my H.haba.mail volume to another server which allows me to
gdb<br>
&gt; and stop the fileserver without been lynched but of course the<br>
&gt; problems dissapeared when I did that. Probably I need to use up some<br>
&gt; kind of resource in the fileserver/rx first. I don't know how without<br>
&gt; letting loose real users. I know I have many connections from many<br>
&gt; clients. But a lot of free threads and no CPU or I/O load to speek
of.<br>
&gt; Feel free to run rxdebug against houting.pdc.kth.se if you think you<br>
&gt; see something that I don't. Any tips how to collect statistics?<br>
&gt; <br>
&gt; Harald.<br>
<br>
I doubt moving your volume is going to help track down the problem.<br>
You are not going to have lots of other users connecting to the new server.<br>
<br>
I don't think we need to be able to stop the service. &nbsp;However, it
would<br>
be useful to see what the server is doing in Ethereal.<br>
<br>
Jeffrey Altman<br>
<br>
<br>
</tt></font>
<br>
--=_alternative 0047CB26C1257093_=--