[OpenAFS-devel] 50 second fetch-data?

Jeffrey Altman jaltman@secure-endpoints.com
Fri, 07 Oct 2005 09:30:29 -0400


This is a cryptographically signed message in MIME format.

--------------ms090907060906090008080905
Content-Type: multipart/alternative;
 boundary="------------070003070604020409070808"

This is a multi-part message in MIME format.
--------------070003070604020409070808
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Sven:

Please be specific.  What do you mean by "I see the same messages"?  

Jeffrey Altman

Sven Oehme wrote:

>
> Oh,
>
> just reading the mail from the beginning :-)
> may be we have two bugs here, my bug also reduce the performance , but
> i have no 50 sec delay, but i see the same messages (hundreds of them)  .
> i start multiple batch jobs from 1 client (different processes) to 1
> server to 1 volume  ..
>
> rxdebug on the client if this helps somebody .. :
>
>
> testblade11:~ # rxdebug localhost 7001 -rxstats -noconns -long
> Trying 127.0.0.1 (port 7001):
> Free packets: 129, packet reclaims: 0, calls: 55, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 1 threads are idle
> rx stats: free packets 129, allocs 452504, alloc-failures(rcv 0/0,send
> 575/0,ack 0)
>    greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers
> 0, selects 0, sendSelects 0
>    packets read: data 8585 ack 124336 busy 0 abort 0 ackall 0
> challenge 53 response 0 debug 1420 params 0 unused 0 unused 0 unused 0
> version 0
>    other read counters: data 8585, ack 124002, dup 0 spurious 333 dally 1
>    packets sent: data 114805 ack 8529 busy 0 abort 0 ackall 0
> challenge 0 response 53 debug 0 params 0 unused 0 unused 0 unused 0
> version 0
>    other send counters: ack 8529, data 870762 (not resends), resends
> 0, pushed 0, acked&ignored 340943
>         (these should be small) sendFailed 0, fatalErrors 0
>    Average rtt is 0.001, with 26815 samples
>    Minimum rtt is 0.000, maximum is 0.095
>    1 server connections, 29 client connections, 2 peer structs, 47
> call structs, 0 free call structs
>
>
> Sven
>
>
>
> *Sven Oehme/Germany/IBM@IBMDE*
> Sent by: openafs-devel-admin@openafs.org
>
> 10/07/05 03:04 PM
>
> 	
> To
> 	Jeffrey Altman <jaltman@secure-endpoints.com>
> cc
> 	Harald Barth <haba@pdc.kth.se>, openafs-devel@openafs.org,
> rees@umich.edu, psomogyi@gamax.hu
> Subject
> 	Re: [OpenAFS-devel] 50 second fetch-data?
>
>
>
> 	
>
>
>
>
>
>
> Hi Jeffrey,
>
> Peter and i work on that bug .. i have a test environment where i can
> reproduce the bug within 2 sec .
> if anybody like to assist us i can provide a tcpdump while it happens ..
>
> Sven
>
>
> *Jeffrey Altman <jaltman@secure-endpoints.com>*
> Sent by: openafs-devel-admin@openafs.org
>
> 10/07/05 02:22 PM
>
> 	
> To
> 	Harald Barth <haba@pdc.kth.se>
> cc
> 	rees@umich.edu, openafs-devel@openafs.org
> Subject
> 	Re: [OpenAFS-devel] 50 second fetch-data?
>
>
>
>
> 	
>
>
>
>
>
>
> Harald Barth wrote:
>
> > You probably mean stuff like this:
> >
> > Wed Oct  5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8)
> already had conn a7071568 (host 3fdded82), stolen by client
> 8320a78(6d5cb8f8)
>
>
> > I have only ONE such log line and not for the time frame in question.
> > 3fdded82 is my laptop 130.237.221.63 when at work. But I have no such
> > message for any of its other IPs which would be *eded82 (130.237.237.*)
> > - my laptop when at home.
>
> This log message is not a symptom of the bug that was fixed related to
> UUID collision.   This problem you are seeing may or may not be related
> and it may or may not be an actual bug.
>
> > I moved my H.haba.mail volume to another server which allows me to gdb
> > and stop the fileserver without been lynched but of course the
> > problems dissapeared when I did that. Probably I need to use up some
> > kind of resource in the fileserver/rx first. I don't know how without
> > letting loose real users. I know I have many connections from many
> > clients. But a lot of free threads and no CPU or I/O load to speek of.
> > Feel free to run rxdebug against houting.pdc.kth.se if you think you
> > see something that I don't. Any tips how to collect statistics?
> >
> > Harald.
>
> I doubt moving your volume is going to help track down the problem.
> You are not going to have lots of other users connecting to the new
> server.
>
> I don't think we need to be able to stop the service.  However, it would
> be useful to see what the server is doing in Ethereal.
>
> Jeffrey Altman
>
>
>

--------------070003070604020409070808
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Sven:<br>
<br>
Please be specific.&nbsp; What do you mean by "I see the same messages"?&nbsp;&nbsp; <br>
<br>
Jeffrey Altman<br>
<br>
Sven Oehme wrote:
<blockquote
 cite="midOFD536300F.DE4C4B87-ONC1257093.0048B2E0-C1257093.0049BC0E@de.ibm.com"
 type="cite"><br>
Oh,
  <br>
  <br>
just reading the mail from the beginning
:-) <br>
may be we have two bugs here, my bug
also reduce the performance , but i have no 50 sec delay, but i see the
same messages (hundreds of them) &nbsp;.<br>
i start multiple batch jobs from 1 client
(different processes) to 1 server to 1 volume &nbsp;..
  <br>
  <br>
rxdebug on the client if this helps somebody
.. :
  <br>
  <br>
  <br>
testblade11:~ # rxdebug localhost 7001 -rxstats
-noconns -long
  <br>
Trying 127.0.0.1 (port 7001):
  <br>
Free packets: 129, packet reclaims: 0, calls:
55, used FDs: 64
  <br>
not waiting for packets.
  <br>
0 calls waiting for a thread
  <br>
1 threads are idle
  <br>
rx stats: free packets 129, allocs 452504,
alloc-failures(rcv 0/0,send 575/0,ack 0)
  <br>
&nbsp; &nbsp;greedy 0, bogusReads 0 (last
from host 0), noPackets 0, noBuffers 0, selects 0, sendSelects 0
  <br>
&nbsp; &nbsp;packets read: data 8585 ack
124336 busy 0 abort 0 ackall 0 challenge 53 response 0 debug 1420
params
0 unused 0 unused 0 unused 0 version 0
  <br>
&nbsp; &nbsp;other read counters: data 8585,
ack 124002, dup 0 spurious 333 dally 1
  <br>
&nbsp; &nbsp;packets sent: data 114805 ack
8529 busy 0 abort 0 ackall 0 challenge 0 response 53 debug 0 params 0
unused
0 unused 0 unused 0 version 0
  <br>
&nbsp; &nbsp;other send counters: ack 8529,
data 870762 (not resends), resends 0, pushed 0, acked&amp;ignored
340943
  <br>
&nbsp; &nbsp; &nbsp; &nbsp; (these should
be small) sendFailed 0, fatalErrors 0
  <br>
&nbsp; &nbsp;Average rtt is 0.001, with 26815
samples
  <br>
&nbsp; &nbsp;Minimum rtt is 0.000, maximum
is 0.095
  <br>
&nbsp; &nbsp;1 server connections, 29 client
connections, 2 peer structs, 47 call structs, 0 free call structs
  <br>
  <br>
  <br>
Sven<br>
  <br>
  <br>
  <br>
  <table width="100%">
    <tbody>
      <tr valign="top">
        <td width="40%"><b>Sven Oehme/Germany/IBM@IBMDE</b>
        <br>
Sent by: <a class="moz-txt-link-abbreviated" href="mailto:openafs-devel-admin@openafs.org">openafs-devel-admin@openafs.org</a>
        <p>10/07/05 03:04 PM
        </p>
        </td>
        <td width="59%">
        <table width="100%">
          <tbody>
            <tr>
              <td>
              <div align="right">To</div>
              </td>
              <td valign="top">Jeffrey Altman
<a class="moz-txt-link-rfc2396E" href="mailto:jaltman@secure-endpoints.com">&lt;jaltman@secure-endpoints.com&gt;</a>
              </td>
            </tr>
            <tr>
              <td>
              <div align="right">cc</div>
              </td>
              <td valign="top">Harald Barth <a class="moz-txt-link-rfc2396E" href="mailto:haba@pdc.kth.se">&lt;haba@pdc.kth.se&gt;</a>,
<a class="moz-txt-link-abbreviated" href="mailto:openafs-devel@openafs.org">openafs-devel@openafs.org</a>, <a class="moz-txt-link-abbreviated" href="mailto:rees@umich.edu">rees@umich.edu</a>, <a class="moz-txt-link-abbreviated" href="mailto:psomogyi@gamax.hu">psomogyi@gamax.hu</a>
              </td>
            </tr>
            <tr>
              <td>
              <div align="right">Subject</div>
              </td>
              <td valign="top">Re: [OpenAFS-devel] 50 second
fetch-data?</td>
            </tr>
          </tbody>
        </table>
        <br>
        <table>
          <tbody>
            <tr valign="top">
              <td>
              <br>
              </td>
              <td><br>
              </td>
            </tr>
          </tbody>
        </table>
        <br>
        </td>
      </tr>
    </tbody>
  </table>
  <br>
  <br>
  <br>
  <br>
Hi Jeffrey, <br>
  <br>
Peter and i work on that bug .. i have a test environment where i can
reproduce
the bug within 2 sec . <br>
if anybody like to assist us i can provide a tcpdump while it happens ..<br>
  <br>
Sven <br>
  <br>
  <br>
  <table width="100%">
    <tbody>
      <tr valign="top">
        <td width="46%"><b>Jeffrey Altman
<a class="moz-txt-link-rfc2396E" href="mailto:jaltman@secure-endpoints.com">&lt;jaltman@secure-endpoints.com&gt;</a></b>
        <br>
Sent by: <a class="moz-txt-link-abbreviated" href="mailto:openafs-devel-admin@openafs.org">openafs-devel-admin@openafs.org</a>
        <p>10/07/05 02:22 PM
        </p>
        </td>
        <td width="53%"><br>
        <table width="100%">
          <tbody>
            <tr>
              <td width="14%">
              <div align="right">To</div>
              </td>
              <td valign="top" width="85%">Harald Barth
<a class="moz-txt-link-rfc2396E" href="mailto:haba@pdc.kth.se">&lt;haba@pdc.kth.se&gt;</a>
              </td>
            </tr>
            <tr>
              <td>
              <div align="right">cc</div>
              </td>
              <td valign="top"><a class="moz-txt-link-abbreviated" href="mailto:rees@umich.edu">rees@umich.edu</a>, <a class="moz-txt-link-abbreviated" href="mailto:openafs-devel@openafs.org">openafs-devel@openafs.org</a>
              </td>
            </tr>
            <tr>
              <td>
              <div align="right">Subject</div>
              </td>
              <td valign="top">Re: [OpenAFS-devel] 50 second
fetch-data?</td>
            </tr>
          </tbody>
        </table>
        <br>
        <br>
        <table width="100%">
          <tbody>
            <tr valign="top">
              <td width="49%">
              <br>
              </td>
              <td width="50%"><br>
              </td>
            </tr>
          </tbody>
        </table>
        <br>
        </td>
      </tr>
    </tbody>
  </table>
  <br>
  <br>
  <br>
  <br>
Harald Barth wrote:<br>
  <br>
&gt; You probably mean stuff like this:<br>
&gt; <br>
&gt; Wed Oct &nbsp;5 17:31:21 2005 FindClient: client 8320a78(6d5cb8f8)
already had conn a7071568 (host 3fdded82), stolen by client
8320a78(6d5cb8f8)<br>
  <br>
  <br>
&gt; I have only ONE such log line and not for the time frame in
question.<br>
&gt; 3fdded82 is my laptop 130.237.221.63 when at work. But I have no
such<br>
&gt; message for any of its other IPs which would be *eded82
(130.237.237.*)<br>
&gt; - my laptop when at home.<br>
  <br>
This log message is not a symptom of the bug that was fixed related to<br>
UUID collision. &nbsp; This problem you are seeing may or may not be related<br>
and it may or may not be an actual bug.<br>
  <br>
&gt; I moved my H.haba.mail volume to another server which allows me to
gdb<br>
&gt; and stop the fileserver without been lynched but of course the<br>
&gt; problems dissapeared when I did that. Probably I need to use up
some<br>
&gt; kind of resource in the fileserver/rx first. I don't know how
without<br>
&gt; letting loose real users. I know I have many connections from many<br>
&gt; clients. But a lot of free threads and no CPU or I/O load to speek
of.<br>
&gt; Feel free to run rxdebug against houting.pdc.kth.se if you think
you<br>
&gt; see something that I don't. Any tips how to collect statistics?<br>
&gt; <br>
&gt; Harald.<br>
  <br>
I doubt moving your volume is going to help track down the problem.<br>
You are not going to have lots of other users connecting to the new
server.<br>
  <br>
I don't think we need to be able to stop the service. &nbsp;However, it
would<br>
be useful to see what the server is doing in Ethereal.<br>
  <br>
Jeffrey Altman<br>
  <br>
  <br>
  <br>
</blockquote>
</body>
</html>

--------------070003070604020409070808--

--------------ms090907060906090008080905
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJXzCC
AwowggJzoAMCAQICAw7NrTANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UE
ChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNv
bmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwHhcNMDUwNTI3MTc0NzU3WhcNMDYwNTI3MTc0NzU3
WjBzMQ8wDQYDVQQEEwZBbHRtYW4xFTATBgNVBCoTDEplZmZyZXkgRXJpYzEcMBoGA1UEAxMT
SmVmZnJleSBFcmljIEFsdG1hbjErMCkGCSqGSIb3DQEJARYcamFsdG1hbkBzZWN1cmUtZW5k
cG9pbnRzLmNvbTCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBAKjPyrF+rdjOUSK/
bWwZHdx5p1+y6iiCd4vvYEVDxouYFp5C/fZEWm5n45ubBUbMSUI1MAZN6ooEoH09UTj6BXhM
S8B987ls81dKOIUphTF2jOzq8gsFmeA15yHMRAD20LqUWeLyvYk8FCNQw+dsKMMhX+WdsxOm
RY/1jPkJL6oN8kEwoUFkOX9/OfWWh6oFnV6faiEHUKDMFubsb9X0KVD8iIeR7Cxz7i4kXqRX
wMlp2fyoxcDIJrBaTY8nA++g3p34IkWt1a5po6g683nIgSnGpwYIwuJheBqSEZfLYWa+1KdD
6Sn27Ud94GqUvPVG5jC6zVC5EJ2aWuoAu+nNuV8CAwEAAaM5MDcwJwYDVR0RBCAwHoEcamFs
dG1hbkBzZWN1cmUtZW5kcG9pbnRzLmNvbTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBAUA
A4GBADtvO//tjiAV6VJGtoNtrl34mB5jGyGTiotzw8riB6zz0GvY11bcWDmp6JKif+pVG+8L
IySDosbuva13qu2HwYUxBmWc7CoNd2k9kRlcrfbDUTTrGOZK8qyqNqT3gQZTAa9ZnUI0su9G
y/n2o5bQcaYdqR3htNrpvdLSPOWhILOXMIIDCjCCAnOgAwIBAgIDDs2tMA0GCSqGSIb3DQEB
BAUAMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3dGUgQ29uc3VsdGluZyAoUHR5KSBM
dGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgSXNzdWluZyBDQTAeFw0w
NTA1MjcxNzQ3NTdaFw0wNjA1MjcxNzQ3NTdaMHMxDzANBgNVBAQTBkFsdG1hbjEVMBMGA1UE
KhMMSmVmZnJleSBFcmljMRwwGgYDVQQDExNKZWZmcmV5IEVyaWMgQWx0bWFuMSswKQYJKoZI
hvcNAQkBFhxqYWx0bWFuQHNlY3VyZS1lbmRwb2ludHMuY29tMIIBIjANBgkqhkiG9w0BAQEF
AAOCAQ8AMIIBCgKCAQEAqM/KsX6t2M5RIr9tbBkd3HmnX7LqKIJ3i+9gRUPGi5gWnkL99kRa
bmfjm5sFRsxJQjUwBk3qigSgfT1ROPoFeExLwH3zuWzzV0o4hSmFMXaM7OryCwWZ4DXnIcxE
APbQupRZ4vK9iTwUI1DD52wowyFf5Z2zE6ZFj/WM+Qkvqg3yQTChQWQ5f3859ZaHqgWdXp9q
IQdQoMwW5uxv1fQpUPyIh5HsLHPuLiRepFfAyWnZ/KjFwMgmsFpNjycD76DenfgiRa3Vrmmj
qDrzeciBKcanBgjC4mF4GpIRl8thZr7Up0PpKfbtR33gapS89UbmMLrNULkQnZpa6gC76c25
XwIDAQABozkwNzAnBgNVHREEIDAegRxqYWx0bWFuQHNlY3VyZS1lbmRwb2ludHMuY29tMAwG
A1UdEwEB/wQCMAAwDQYJKoZIhvcNAQEEBQADgYEAO287/+2OIBXpUka2g22uXfiYHmMbIZOK
i3PDyuIHrPPQa9jXVtxYOanokqJ/6lUb7wsjJIOixu69rXeq7YfBhTEGZZzsKg13aT2RGVyt
9sNRNOsY5kryrKo2pPeBBlMBr1mdQjSy70bL+fajltBxph2pHeG02um90tI85aEgs5cwggM/
MIICqKADAgECAgENMA0GCSqGSIb3DQEBBQUAMIHRMQswCQYDVQQGEwJaQTEVMBMGA1UECBMM
V2VzdGVybiBDYXBlMRIwEAYDVQQHEwlDYXBlIFRvd24xGjAYBgNVBAoTEVRoYXd0ZSBDb25z
dWx0aW5nMSgwJgYDVQQLEx9DZXJ0aWZpY2F0aW9uIFNlcnZpY2VzIERpdmlzaW9uMSQwIgYD
VQQDExtUaGF3dGUgUGVyc29uYWwgRnJlZW1haWwgQ0ExKzApBgkqhkiG9w0BCQEWHHBlcnNv
bmFsLWZyZWVtYWlsQHRoYXd0ZS5jb20wHhcNMDMwNzE3MDAwMDAwWhcNMTMwNzE2MjM1OTU5
WjBiMQswCQYDVQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRk
LjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwgZ8wDQYJ
KoZIhvcNAQEBBQADgY0AMIGJAoGBAMSmPFVzVftOucqZWh5owHUEcJ3f6f+jHuy9zfVb8hp2
vX8MOmHyv1HOAdTlUAow1wJjWiyJFXCO3cnwK4Vaqj9xVsuvPAsH5/EfkTYkKhPPK9Xzgnc9
A74r/rsYPge/QIACZNenprufZdHFKlSFD0gEf6e20TxhBEAeZBlyYLf7AgMBAAGjgZQwgZEw
EgYDVR0TAQH/BAgwBgEB/wIBADBDBgNVHR8EPDA6MDigNqA0hjJodHRwOi8vY3JsLnRoYXd0
ZS5jb20vVGhhd3RlUGVyc29uYWxGcmVlbWFpbENBLmNybDALBgNVHQ8EBAMCAQYwKQYDVR0R
BCIwIKQeMBwxGjAYBgNVBAMTEVByaXZhdGVMYWJlbDItMTM4MA0GCSqGSIb3DQEBBQUAA4GB
AEiM0VCD6gsuzA2jZqxnD3+vrL7CF6FDlpSdf0whuPg2H6otnzYvwPQcUCCTcDz9reFhYsPZ
Ohl+hLGZGwDFGguCdJ4lUJRix9sncVcljd2pnDmOjCBPZV+V2vf3h9bGCE6u9uo05RAaWzVN
d+NWIXiC3CEZNd4ksdMdRv9dX2VPMYIDOzCCAzcCAQEwaTBiMQswCQYDVQQGEwJaQTElMCMG
A1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBl
cnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAw7NrTAJBgUrDgMCGgUAoIIBpzAYBgkqhkiG
9w0BCQMxCwYJKoZIhvcNAQcBMBwGCSqGSIb3DQEJBTEPFw0wNTEwMDcxMzMwMjlaMCMGCSqG
SIb3DQEJBDEWBBTxhM9NerAG3u+eWJSwdi6z+7nOgTBSBgkqhkiG9w0BCQ8xRTBDMAoGCCqG
SIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG
9w0DAgIBKDB4BgkrBgEEAYI3EAQxazBpMGIxCzAJBgNVBAYTAlpBMSUwIwYDVQQKExxUaGF3
dGUgQ29uc3VsdGluZyAoUHR5KSBMdGQuMSwwKgYDVQQDEyNUaGF3dGUgUGVyc29uYWwgRnJl
ZW1haWwgSXNzdWluZyBDQQIDDs2tMHoGCyqGSIb3DQEJEAILMWugaTBiMQswCQYDVQQGEwJa
QTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhh
d3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAw7NrTANBgkqhkiG9w0BAQEFAASC
AQAkyE1EX1eAJXjWkVGTmbni2+gCB/EO/tqve0sjaBPnLvnazNXR1GqhXe/HODQKGrkf0S7L
JJ65jh7LecCp8m3eozFzAHcGE+rCmsR3XNb07r5YPiA7sOIeOTD8qGYHCGFuSVjy6CdTaqIu
Cc8RpsOSQqrdGyjHRJiW+aCEY/nIwglphwkck0I//t1j+TGMzuE6ZQHBk6b5wFfZMNFgtKIL
j/y9bB9FxT0gaeHPprtArzAgC9NmPCSTej0gmUtU6MtwCUFVZQd7azkhm1HP4QvST4tSwaOd
6RECzcvr1FQK2nNkqrpM7aOLxYQIVKpbR3+yjBBMYuF6OSb4L3bD/jC9AAAAAAAA
--------------ms090907060906090008080905--