[OpenAFS] cache problems in cluster environment

Derrick J Brashear shadow@dementia.org
Tue, 21 Aug 2007 09:35:47 -0400 (EDT)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-1492211386-1187703347=:11165
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by meredith.dementia.org id l7LDZl1c019229

On Tue, 21 Aug 2007, Thomas Sesselmann wrote:

>
> Hello,
>
> we are using OpenAFS in a cluster-environment.
> Now we have about 100 Linux-Clients (Ubuntu 6.06)
> with OpenAFS 1.4.1-2 (system-default).

callback breaks being dropped, so the client isn't refetching. what on=20
your network is either filtering or expiring port mappings, and can you=20
fix it?


>
> Sometimes the data on a node isn't consistent
> and we can't find an issue ... :(
> We hope someone on this list has an idea to debug or solve this problem=
...?
>
>
>
> We can create some files with touch on one node:
>
> 14:49:14 sesselm@alihlt-gw1:~/test$ ls
> 14:49:57 sesselm@alihlt-gw1:~/test$ touch test1
> 14:50:07 sesselm@alihlt-gw1:~/test$ touch test2
> 14:50:08 sesselm@alihlt-gw1:~/test$ touch test3
> 14:50:09 sesselm@alihlt-gw1:~/test$ touch test4
> 14:50:11 sesselm@alihlt-gw1:~/test$ ll
> total 0
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test1
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test2
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test3
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test4
>
>
> On the other node this files are not listed, but exists:
>
> 14:50:16 sesselm@feptpcao03:~/test$ ll
> total 0
> 14:50:18 sesselm@feptpcao03:~/test$ touch test1
> touch: cannot touch `test1': File exists
> 14:50:25 sesselm@feptpcao03:~/test$ ll
> total 0
>
>
> From this node a file can normally create and also listed from other no=
des:
>
> 14:50:27 sesselm@feptpcao03:~/test$ touch test5
> 14:50:31 sesselm@feptpcao03:~/test$ ll
> total 0
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test5
>
> 14:50:36 sesselm@alihlt-gw1:~/test$ ll
> total 0
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test1
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test2
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test3
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test4
> -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test5
> 14:50:38 sesselm@alihlt-gw1:~/test$
>
>
> The flushvolume-command helps, but only for the moment.
>
>
> Here are some outputs of rxdebug:
> -----------------------------------------------------------------------=
--
>
> 14:51:51 sesselm@alihlt-gw1:~/test$ rxdebug ms1
> Trying 10.162.5.7 (port 7000):
> Free packets: 705, packet reclaims: 5, calls: 1397969, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 25 threads are idle
> Connection from host 10.162.4.69, port 7002, Cuid b1b59233/77083130
>  serial 5276,  natMTU 1444, flags pktCksum, security index 2, client co=
nn
>  rxkad: level crypt, flags pktCksum
>  Received 62256 bytes in 2596 packets
>  Sent 324132 bytes in 2596 packets
>    call 0: # 2596, state dally, mode: receiving, flags: receive_done
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> Done.
>
>
> 14:52:03 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -peers |grep Peer |wc -=
l
> 93
>
>
> 14:53:00 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -allconnections |grep -=
n 10.162.0.105
> 2220:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc92=
d4
> 6300:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98=
9c
> 6345:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98=
a4
> 6639:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98=
d8
> 6711:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dd=
c8
> 6753:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dd=
cc
> 6792:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dd=
d0
> 6819:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dd=
d4
> 6828:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dd=
d8
> 6921:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dd=
e4
> 7011:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dd=
f4
> 7098:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de=
00
> 7200:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de=
0c
> 7263:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de=
10
> 7335:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de=
14
>
>
> 14:54:43 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -allconnections |grep -=
A 8 10.162.0.105
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc92d4
>  serial 6,  natMTU 1444, security index 0, server conn
>    call 0: # 6, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc989c
>  serial 3,  natMTU 1444, security index 0, server conn
>    call 0: # 2, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98a4
>  serial 11,  natMTU 1444, security index 0, server conn
>    call 0: # 7, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98d8
>  serial 8,  natMTU 1444, security index 0, server conn
>    call 0: # 5, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddc8
>  serial 8,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 112 bytes in 4 packets
>  Sent 4584 bytes in 7 packets
>    call 0: # 5, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddcc
>  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 88 bytes in 3 packets
>  Sent 4464 bytes in 6 packets
>    call 0: # 4, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd0
>  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 88 bytes in 3 packets
>  Sent 4464 bytes in 6 packets
>    call 0: # 4, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd4
>  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 88 bytes in 3 packets
>  Sent 4464 bytes in 6 packets
>    call 0: # 4, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd8
>  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 88 bytes in 3 packets
>  Sent 4464 bytes in 6 packets
>    call 0: # 4, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dde4
>  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 88 bytes in 3 packets
>  Sent 4464 bytes in 6 packets
>    call 0: # 4, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddf4
>  serial 79,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 719.5 hou=
rs
>  Received 6656 bytes in 44 packets
>  Sent 51498 bytes in 71 packets
>    call 0: # 42, state dally, mode: eof, flags: receive_done
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de00
>  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 88 bytes in 3 packets
>  Sent 4464 bytes in 6 packets
>    call 0: # 4, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de0c
>  serial 42,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 719.5 hou=
rs
>  Received 28232 bytes in 41 packets
>  Sent 11290 bytes in 28 packets
>    call 0: # 24, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de10
>  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hour=
s
>  Received 88 bytes in 3 packets
>  Sent 4464 bytes in 6 packets
>    call 0: # 4, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
> --
> Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de14
>  serial 43,  natMTU 1444, flags pktCksum, security index 2, server conn
>  rxkad: level crypt, flags authenticated pktCksum, expires in 719.8 hou=
rs
>  Received 5608 bytes in 30 packets
>  Sent 23290 bytes in 40 packets
>    call 0: # 28, state not initialized
>    call 1: # 0, state not initialized
>    call 2: # 0, state not initialized
>    call 3: # 0, state not initialized
>
>
> rxdebug ms1 -onlyhost feptpcao03 -rxstats
> Trying 10.162.5.7 (port 7000):
> Free packets: 703, packet reclaims: 5, calls: 1776093, used FDs: 64
> not waiting for packets.
> 0 calls waiting for a thread
> 24 threads are idle
> rx stats: free packets 703, allocs 18005216, alloc-failures(rcv 0/0,sen=
d 0/0,ack 0)
>   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, =
selects 0, sendSelects 0
>   packets read: data 12281310 ack 4471915 busy 0 abort 4 ackall 0 chall=
enge 85 response 1378 debug 16519 params 0 unused 0 unused 0 unused 0 ver=
sion 0
>   other read counters: data 12281310, ack 4471892, dup 27 spurious 20 d=
ally 0
>   packets sent: data 5409131 ack 5504068 busy 0 abort 232 ackall 0 chal=
lenge 1378 response 85 debug 0 params 0 unused 0 unused 0 unused 0 versio=
n 0
>   other send counters: ack 5504068, data 10819136 (not resends), resend=
s 801, pushed 0, acked&ignored 31354504
>        (these should be small) sendFailed 0, fatalErrors 5
>   Average rtt is 0.002, with 1834223 samples
>   Minimum rtt is 0.000, maximum is 43.049
>   1284 server connections, 19 client connections, 93 peer structs, 155 =
call structs, 131 free call structs
> Showing only connections from host 10.162.0.105
> Done.
>
>
>
> thanks and best regards
>
> Thomas Sesselmann
>
> --=20
> Dipl.-Inf. Thomas Sesselmann         __O
> Kirchhoff-Institut f=FCr Physik      _\-<,
> Universit=E4t Heidelberg           _(_)/(_)_
> INF227 / D-69120 Heidelberg
> Tel.:   +49/6221/54-9132
> E-Mail: Thomas.Sesselmann@kip.uni-heidelberg.de
> gpg-key: 0x9392E54B  or finger -l tsesselm@ix.urz.uni-heidelberg.de
>
>
>
---559023410-1492211386-1187703347=:11165--