[OpenAFS] cache problems in cluster environment

Thomas Sesselmann thomas.sesselmann@kip.uni-heidelberg.de
Tue, 21 Aug 2007 15:30:48 +0200


This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig9B456D9AE69455E50B194EDF
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: quoted-printable


Hello,

we are using OpenAFS in a cluster-environment.
Now we have about 100 Linux-Clients (Ubuntu 6.06)
with OpenAFS 1.4.1-2 (system-default).


Sometimes the data on a node isn't consistent
and we can't find an issue ... :(
We hope someone on this list has an idea to debug or solve this problem..=
=2E?



We can create some files with touch on one node:

 14:49:14 sesselm@alihlt-gw1:~/test$ ls
 14:49:57 sesselm@alihlt-gw1:~/test$ touch test1
 14:50:07 sesselm@alihlt-gw1:~/test$ touch test2
 14:50:08 sesselm@alihlt-gw1:~/test$ touch test3
 14:50:09 sesselm@alihlt-gw1:~/test$ touch test4
 14:50:11 sesselm@alihlt-gw1:~/test$ ll
 total 0
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test1
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test2
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test3
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test4


On the other node this files are not listed, but exists:

 14:50:16 sesselm@feptpcao03:~/test$ ll
 total 0
 14:50:18 sesselm@feptpcao03:~/test$ touch test1
 touch: cannot touch `test1': File exists
 14:50:25 sesselm@feptpcao03:~/test$ ll
 total 0


=46rom this node a file can normally create and also listed from other no=
des:

 14:50:27 sesselm@feptpcao03:~/test$ touch test5
 14:50:31 sesselm@feptpcao03:~/test$ ll
 total 0
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test5

 14:50:36 sesselm@alihlt-gw1:~/test$ ll
 total 0
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test1
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test2
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test3
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test4
 -rw-r--r-- 1 sesselm admin 0 2007-08-21 14:50 test5
 14:50:38 sesselm@alihlt-gw1:~/test$


The flushvolume-command helps, but only for the moment.


Here are some outputs of rxdebug:
-------------------------------------------------------------------------=


14:51:51 sesselm@alihlt-gw1:~/test$ rxdebug ms1
Trying 10.162.5.7 (port 7000):
Free packets: 705, packet reclaims: 5, calls: 1397969, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
25 threads are idle
Connection from host 10.162.4.69, port 7002, Cuid b1b59233/77083130
  serial 5276,  natMTU 1444, flags pktCksum, security index 2, client con=
n
  rxkad: level crypt, flags pktCksum
  Received 62256 bytes in 2596 packets
  Sent 324132 bytes in 2596 packets
    call 0: # 2596, state dally, mode: receiving, flags: receive_done
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Done.


14:52:03 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -peers |grep Peer |wc -l
93


14:53:00 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -allconnections |grep -n =
10.162.0.105
2220:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc92d4=

6300:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc989c=

6345:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98a4=

6639:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98d8=

6711:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddc8=

6753:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddcc=

6792:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd0=

6819:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd4=

6828:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd8=

6921:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dde4=

7011:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddf4=

7098:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de00=

7200:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de0c=

7263:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de10=

7335:Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de14=



14:54:43 sesselm@alihlt-gw1:~/test$ rxdebug ms1 -allconnections |grep -A =
8 10.162.0.105
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc92d4
  serial 6,  natMTU 1444, security index 0, server conn
    call 0: # 6, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc989c
  serial 3,  natMTU 1444, security index 0, server conn
    call 0: # 2, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98a4
  serial 11,  natMTU 1444, security index 0, server conn
    call 0: # 7, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/1adc98d8
  serial 8,  natMTU 1444, security index 0, server conn
    call 0: # 5, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddc8
  serial 8,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 112 bytes in 4 packets
  Sent 4584 bytes in 7 packets
    call 0: # 5, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddcc
  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 88 bytes in 3 packets
  Sent 4464 bytes in 6 packets
    call 0: # 4, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd0
  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 88 bytes in 3 packets
  Sent 4464 bytes in 6 packets
    call 0: # 4, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd4
  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 88 bytes in 3 packets
  Sent 4464 bytes in 6 packets
    call 0: # 4, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddd8
  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 88 bytes in 3 packets
  Sent 4464 bytes in 6 packets
    call 0: # 4, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963dde4
  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 88 bytes in 3 packets
  Sent 4464 bytes in 6 packets
    call 0: # 4, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963ddf4
  serial 79,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 719.5 hour=
s
  Received 6656 bytes in 44 packets
  Sent 51498 bytes in 71 packets
    call 0: # 42, state dally, mode: eof, flags: receive_done
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de00
  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 88 bytes in 3 packets
  Sent 4464 bytes in 6 packets
    call 0: # 4, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de0c
  serial 42,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 719.5 hour=
s
  Received 28232 bytes in 41 packets
  Sent 11290 bytes in 28 packets
    call 0: # 24, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de10
  serial 7,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 22.6 hours=

  Received 88 bytes in 3 packets
  Sent 4464 bytes in 6 packets
    call 0: # 4, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
--
Connection from host 10.162.0.105, port 7001, Cuid 9d1cfd4c/4963de14
  serial 43,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level crypt, flags authenticated pktCksum, expires in 719.8 hour=
s
  Received 5608 bytes in 30 packets
  Sent 23290 bytes in 40 packets
    call 0: # 28, state not initialized
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized


rxdebug ms1 -onlyhost feptpcao03 -rxstats
Trying 10.162.5.7 (port 7000):
Free packets: 703, packet reclaims: 5, calls: 1776093, used FDs: 64
not waiting for packets.
0 calls waiting for a thread
24 threads are idle
rx stats: free packets 703, allocs 18005216, alloc-failures(rcv 0/0,send =
0/0,ack 0)
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, s=
elects 0, sendSelects 0
   packets read: data 12281310 ack 4471915 busy 0 abort 4 ackall 0 challe=
nge 85 response 1378 debug 16519 params 0 unused 0 unused 0 unused 0 vers=
ion 0
   other read counters: data 12281310, ack 4471892, dup 27 spurious 20 da=
lly 0
   packets sent: data 5409131 ack 5504068 busy 0 abort 232 ackall 0 chall=
enge 1378 response 85 debug 0 params 0 unused 0 unused 0 unused 0 version=
 0
   other send counters: ack 5504068, data 10819136 (not resends), resends=
 801, pushed 0, acked&ignored 31354504
        (these should be small) sendFailed 0, fatalErrors 5
   Average rtt is 0.002, with 1834223 samples
   Minimum rtt is 0.000, maximum is 43.049
   1284 server connections, 19 client connections, 93 peer structs, 155 c=
all structs, 131 free call structs
Showing only connections from host 10.162.0.105
Done.



thanks and best regards

Thomas Sesselmann

--=20
Dipl.-Inf. Thomas Sesselmann         __O
Kirchhoff-Institut f=FCr Physik      _\-<,
Universit=E4t Heidelberg           _(_)/(_)_
INF227 / D-69120 Heidelberg
Tel.:   +49/6221/54-9132
E-Mail: Thomas.Sesselmann@kip.uni-heidelberg.de
gpg-key: 0x9392E54B  or finger -l tsesselm@ix.urz.uni-heidelberg.de



--------------enig9B456D9AE69455E50B194EDF
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGyukMEQVslJOS5UsRCrSaAJ4xx0imwOCpKNxpEf0OY8lpI7LmCQCfengl
3auXRwmJgRDDM7ehyZgM1oU=
=Z3QP
-----END PGP SIGNATURE-----

--------------enig9B456D9AE69455E50B194EDF--