[OpenAFS-devel] 1.3.X: Problem with many connections exhausting resources?
   
    Harald Barth
     
    haba@pdc.kth.se
       
    Tue, 24 May 2005 17:42:26 +0200 (MEST)
    
    
  
With my 1.3.81 clients i have sometimes problems that it flags some
servers down for no particualar reason I can see. At the same time
I see that these clients have many many multiple unused connections
to the same servers.
Example:
d04n35# /usr/openafs/sbin/rxdebug d04n35.pdc.kth.se 7001 -all | grep Connection | awk '{print $4}' | sort | uniq -c | sed s/,//g| awk '{print "echo -n "$1" \" \"; host "$2" | cut -d\" \" -f 5"}'|bash
263  MEREDITH.DEMENTIA.ORG.
  1  kosmos.nada.kth.se.
312  gre.nada.kth.se.
307  no.nada.kth.se.
314  li.nada.kth.se.
312  gre-227.nada.kth.se.
309  no-227.nada.kth.se.
312  li-227.nada.kth.se.
  1  anna.pdc.kth.se.
257  kelp.pdc.kth.se.
263  houting.pdc.kth.se.
322  carp.pdc.kth.se.
302  gills.pdc.kth.se.
256  kvikklunsj.stacken.kth.se.
256  kexchoklad.stacken.kth.se.
255  prisextra.e.kth.se.
320  cysteine.pdc.kth.se.
326  aspartate.pdc.kth.se.
258  alanine.pdc.kth.se.
257  kelp-le.pdc.kth.se.
270  houting-le.pdc.kth.se.
(kosmos and anna are VLDB)
I don't see this piling up of connections with 1.2.X clients, for example run the above
against hansen.math.kth.se, you will barely see any great number of conections. If I
test kallsup.pdc.kth.se which is a 1.2.X client I see the same low numbers against
most (probably 1.2.X) servers but three servers with 1.3.{77,81} have many 
connections:
d04n35# /usr/openafs/sbin/rxdebug kallsup.pdc.kth.se 7001 -all | grep Connection | awk '{print $4}' | sort | uniq -c | sed s/,//g| awk '{print "echo -n "$1" \" \"; host "$2" | cut -d\" \" -f 5"}'|bash|sort -n|tail -10
2  no.nada.kth.se.
2  temp-dns-hack2.wam.umd.edu.
3  afs1.hallf.kth.se.
3  jinx.ncsa.uiuc.edu.
30  li-227.nada.kth.se.
66  gre.nada.kth.se.
72  no-227.nada.kth.se.
293  gills.pdc.kth.se.
305  carp.pdc.kth.se.
311  houting.pdc.kth.se.
So, is the rx in 1.3.X broken and leaking resources?
Has this something to do with timeouts and NAT-stuff?
Is there a way to terminate connections forcefully?
If this is the NAT stuff makeing me unhappy, how
do I turn it off?
Sorry if this is a known/fixed problem, and I'm just repeating
something known. All pointers are appreciated!
Harald.