[OpenAFS] cache problems in cluster environment
Thomas Sesselmann
thomas.sesselmann@kip.uni-heidelberg.de
Tue, 21 Aug 2007 15:30:48 +0200
This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig9B456D9AE69455E50B194EDF
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: quoted-printable
Hello,
we are using OpenAFS in a cluster-environment.
Now we have about 100 Linux-Clients (Ubuntu 6.06)
with OpenAFS 1.4.1-2 (system-default).
Sometimes the data on a node isn't consistent
and we can't find an issue ... :(
We hope someone on this list has an idea to debug or solve this problem..=
=2E?
We can create some files with touch on one node:
14:49:14 sesselm@alihlt-gw1:~/test$ ls
14:49:57 sesselm@alihlt-gw1:~/test$ touch test1
14:50:07 sesselm@alihlt-gw1:~/test$ touch test2
14:50:08 sesselm@alihlt-gw1:~/test$ touch test3
14:50:09 sesselm@alihlt-gw1:~/test$ touch test4
14:50:11 sesselm@alihlt-gw1:~/test$ ll
total 0
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test1
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test2
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test3
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test4
On the other node this files are not listed, but exists:
14:50:16 sesselm@feptpcao03:~/test$ ll
total 0
14:50:18 sesselm@feptpcao03:~/test$ touch test1
touch: cannot touch `test1': File exists
14:50:25 sesselm@feptpcao03:~/test$ ll
total 0
=46rom this node a file can normally create and also listed from other no=
des:
14:50:27 sesselm@feptpcao03:~/test$ touch test5
14:50:31 sesselm@feptpcao03:~/test$ ll
total 0
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test5
14:50:36 sesselm@alihlt-gw1:~/test$ ll
total 0
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test1
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test2
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test3
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test4
-rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test5
14:50:38 sesselm@alihlt-gw1:~/test$
The flushvolume-command helps, but only for the moment.
Here are some outputs of rxdebug:
-------------------------------------------------------------------------=
14:51:51 sesselm@alihlt-gw1:~/test$ rxdebug ms1
Trying 10.162.5.7 (port 7000):
Free packets: 705, packet reclaims: 5, calls: 1397969, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
25 threads are idle
Connection from host 10.162.4.69, port 7002, Cuid b1b59233/77083130
serial 5276, natMTU 1444, flags pktCksum, security index 2, client con=
n
rxkad: level crypt, flags pktCksum
Received 62256 bytes in 2596 packets
Sent 324132 bytes in 2596 packets
call 0: # 2596, state dally, mode: receiving, flags: receive_done
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
Done.
14:52:03 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -peers |grep Peer |wc -l
93
14:53:00 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -allconnections |grep -n =
10.162.0.105
2220:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc92d4=
6300:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc989c=
6345:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98a4=
6639:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98d8=
6711:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddc8=
6753:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddcc=
6792:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd0=
6819:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd4=
6828:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd8=
6921:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dde4=
7011:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddf4=
7098:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de00=
7200:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de0c=
7263:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de10=
7335:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de14=
14:54:43 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -allconnections |grep -A =
8 10.162.0.105
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc92d4
serial 6, natMTU 1444, security index 0, server conn
call 0: # 6, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc989c
serial 3, natMTU 1444, security index 0, server conn
call 0: # 2, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98a4
serial 11, natMTU 1444, security index 0, server conn
call 0: # 7, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98d8
serial 8, natMTU 1444, security index 0, server conn
call 0: # 5, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddc8
serial 8, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 112 bytes in 4 packets
Sent 4584 bytes in 7 packets
call 0: # 5, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddcc
serial 7, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 88 bytes in 3 packets
Sent 4464 bytes in 6 packets
call 0: # 4, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd0
serial 7, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 88 bytes in 3 packets
Sent 4464 bytes in 6 packets
call 0: # 4, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd4
serial 7, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 88 bytes in 3 packets
Sent 4464 bytes in 6 packets
call 0: # 4, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd8
serial 7, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 88 bytes in 3 packets
Sent 4464 bytes in 6 packets
call 0: # 4, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dde4
serial 7, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 88 bytes in 3 packets
Sent 4464 bytes in 6 packets
call 0: # 4, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddf4
serial 79, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 719.5 hour=
s
Received 6656 bytes in 44 packets
Sent 51498 bytes in 71 packets
call 0: # 42, state dally, mode: eof, flags: receive_done
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de00
serial 7, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 88 bytes in 3 packets
Sent 4464 bytes in 6 packets
call 0: # 4, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de0c
serial 42, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 719.5 hour=
s
Received 28232 bytes in 41 packets
Sent 11290 bytes in 28 packets
call 0: # 24, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de10
serial 7, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=
Received 88 bytes in 3 packets
Sent 4464 bytes in 6 packets
call 0: # 4, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de14
serial 43, natMTU 1444, flags pktCksum, security index 2, server conn
rxkad: level crypt, flags authenticated pktCksum, expires in 719.8 hour=
s
Received 5608 bytes in 30 packets
Sent 23290 bytes in 40 packets
call 0: # 28, state not initialized
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
rxdebug ms1 -onlyhost feptpcao03 -rxstats
Trying 10.162.5.7 (port 7000):
Free packets: 703, packet reclaims: 5, calls: 1776093, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
24 threads are idle
rx stats: free packets 703, allocs 18005216, alloc-failures(rcv 0/0,send =
0/0,ack 0)
greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, s=
elects 0, sendSelects 0
packets read: data 12281310 ack 4471915 busy 0 abort 4 ackall 0 challe=
nge 85 response 1378 debug 16519 params 0 unused 0 unused 0 unused 0 vers=
ion 0
other read counters: data 12281310, ack 4471892, dup 27 spurious 20 da=
lly 0
packets sent: data 5409131 ack 5504068 busy 0 abort 232 ackall 0 chall=
enge 1378 response 85 debug 0 params 0 unused 0 unused 0 unused 0 version=
0
other send counters: ack 5504068, data 10819136 (not resends), resends=
801, pushed 0, acked&ignored 31354504
(these should be small) sendFailed 0, fatalErrors 5
Average rtt is 0.002, with 1834223 samples
Minimum rtt is 0.000, maximum is 43.049
1284 server connections, 19 client connections, 93 peer structs, 155 c=
all structs, 131 free call structs
Showing only connections from host 10.162.0.105
Done.
thanks and best regards
Thomas Sesselmann
--=20
Dipl.-Inf. Thomas Sesselmann __O
Kirchhoff-Institut f=FCr Physik _\-<,
Universit=E4t Heidelberg _(_)/(_)_
INF227 / D-69120 Heidelberg
Tel.: +49/6221/54-9132
E-Mail: Thomas.Sesselmann@kip.uni-heidelberg.de
gpg-key: 0x9392E54B or finger -l tsesselm@ix.urz.uni-heidelberg.de
--------------enig9B456D9AE69455E50B194EDF
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFGyukMEQVslJOS5UsRCrSaAJ4xx0imwOCpKNxpEf0OY8lpI7LmCQCfengl
3auXRwmJgRDDM7ehyZgM1oU=
=Z3QP
-----END PGP SIGNATURE-----
--------------enig9B456D9AE69455E50B194EDF--