[OpenAFS-devel] OpenAFS server 1.3.80 on x86_64

Ulrich Schwickerath ulrich.schwickerath@iwr.fzk.de
Mon, 11 Apr 2005 09:01:42 +0200


I tried running rxdebug as suggested by Hartmut connecting to my two x86_64 
and i386 servers with the following result:

First connecting to the working server (Opteron with i386 installed):

[root@iwrafs0 etc]# ./rxdebug -server iwrafs1 -long -rxstats
Trying 192.168.167.251 (port 7000):
Free packets: 582, packet reclaims: 0, calls: 3, used FDs: 16
not waiting for packets.
0 calls waiting for a thread
11 threads are idle
rx stats: free packets 582, allocs 28, alloc-failures(rcv 0/0,send 0/0,ack 0)
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, 
selects 0, sendSelects 0
   packets read: data 9 ack 4 busy 0 abort 2 ackall 0 challenge 3 response 0 
debug 17 params 0 unused 0 unused 0 unused 0 version 0
   other read counters: data 9, ack 4, dup 0 spurious 0 dally 0
   packets sent: data 11 ack 7 busy 0 abort 0 ackall 0 challenge 0 response 3 
debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other send counters: ack 7, data 22 (not resends), resends 0, pushed 0, 
acked&ignored 0
        (these should be small) sendFailed 0, fatalErrors 0
   1 server connections, 3 client connections, 3 peer structs, 3 call structs, 
3 free call structs
   0 clock updates
Done.


Now the output when connecting to the (locally running) server with x86_64. In 
this case
I previously (last week) attempted to mount /afs by starting afsd with
./afsd  -memcache -verbose -nosettime -stat 2000 -dcache 800 -daemons 3 
-volumes 70
dmesg shows now:
Found system call table at 0xffffffff805dcd40 (scan: close+chdir+write)
Found 32-bit system call table at 0xffffffff80438840 (exported)
Starting AFS cache scan...Memory cache: Allocating 800 dcache entries...found 
0 non-empty cache files (0%).
afs: Lost contact with file server 192.168.167.250 in cell fzk.de (all 
multi-homed ip addresses down for the server)
afs: Lost contact with file server 192.168.167.250 in cell fzk.de (all 
multi-homed ip addresses down for the server)
One instance of afsd is still there:
root     17607  0.0  0.0     0    0 ?        SW   Apr08   0:00 [afsd]

I get:

[root@iwrafs0 etc]# ./rxdebug -server iwrafs0 -long -rxstats
Trying 192.168.167.250 (port 7000):
Free packets: 581, packet reclaims: 0, calls: 0, used FDs: 16
not waiting for packets.
0 calls waiting for a thread
10 threads are idle
rx stats: free packets 581, allocs 72580, alloc-failures(rcv 0/0,send 0/0,ack 
0)
   greedy 0, bogusReads 0 (last from host 0), noPackets 0, noBuffers 0, 
selects 0, sendSelects 0
   packets read: data 4 ack 75451 busy 0 abort 2 ackall 0 challenge 2892 
response 0 debug 28 params 0 unused 0 unused 0 unused 0 version 0
   other read counters: data 4, ack 75451, dup 0 spurious 0 dally 0
   packets sent: data 2896 ack 72569 busy 0 abort 0 ackall 0 challenge 0 
response 2892 debug 0 params 0 unused 0 unused 0 unused 0 version 0
   other send counters: ack 72569, data 5792 (not resends), resends 2890, 
pushed 0, acked&ignored 2890
        (these should be small) sendFailed 0, fatalErrors 0
   Average rtt is 0.000, with 2890 samples
   Minimum rtt is 0.000, maximum is 0.000
   1 server connections, 3 client connections, 3 peer structs, 3 call structs, 
1 free call structs
   0 clock updates
Connection from host 192.168.167.250, port 7001, Cuid c2567ce1/96d7f18
  serial 50956,  natMTU 1444, security index 0, server conn
    call 0: # 1, state active, mode: receiving, flags: receive_done
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host 192.168.167.250, port 7002, Cuid 875c66fc/82e8a1b0
  serial 27388,  natMTU 1444, flags pktCksum, security index 2, client conn
  rxkad: level crypt, flags pktCksum
  Received 48 bytes in 2 packets
  Sent 24 bytes in 3 packets
    call 0: # 3, state active, mode: receiving, flags: reader_wait, 
has_output_packets
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Done.

Honestly, I do not know how I have to read this output. Does it help ?

Thank's,
Ulrich










On Friday 08 April 2005 16:54, Hartmut Reuter wrote:
> On opterons with SLES9 we had a problem that the epoch for the
> rx-connection was 0x80000000 which comes from an timeofday call which
> returned zero. This confused the fileservers which finds the connection
> using the epoch. With many of these opterons (180) he always ended up
> with the wrong connection with the effect that the clients got
> connection timed out.
>
> But this seems to be a different problem. You may use rxdebug to verify
> that your client's epochs are correct.
>
> Hartmut Retuer
>
> Ulrich Schwickerath wrote:
> > Hi, again,
> >
> > sorry for the long period of silence from my side, I was mostly out of
> > office this week, and only now managed to resumed working on this toppic.
> > In order to exclude basic errors or errors introduced by third party
> > RPM's I started from scratch on both a i386 and a amd64 system, both SMP
> > running kernel version, and both reinstalled from scratch. Both nodes are
> > AMD Opteron nodes. The operating system is SL303, and the kernel version
> > 2.4.21-20.ELsmp. I started with the original tar balls from open IB
> > (version openafs-1.3.80), and did a basic configuration aka
> > ./configure --enable-transarc-paths
> > make
> > make dest
> > repeating the same step on both nodes. While it works on the i386 system,
> > I reproduced the reported problem on the 64bit Opteron node, that is when
> > trying to start the client without -dynroot it gets stuck (although the
> > root.afs and root.client volumes are  there) , if using -dynroot afs is
> > mounted, but if I try to access it eg. with fs, fs itself gets stuck
> > (previously I evens saw a segfault at this step but no oops in the syslog
> > which I could send you). So, I think there is definitely problem for
> > Opterons on 64bit. I can live running the box with i386 system on it, and
> > that is most probably what I'm going to do now, but if there is any more
> > piece of information that I can send to you to be able to investigate the
> > problem please let me know. It would be really nice to find a solution
> > for this :-)
> >
> > Many thank's for all the tipps that I got from you!
> > Ulrich

-- 
__________________________________________
Dr. Ulrich Schwickerath
Forschungszentrum Karlsruhe
GRID-Computing and e-Science
Institut for Scientific Computing (IWR)
P.O. Box 36 40
76021 Karlsruhe, Germany

Tel: +49(7247)82-8607
Fax: +49(7247)82-4972 

e-mail: ulrich.schwickerath@iwr.fzk.de
PGP DH/DSS Key: ID 0xCEB9826F
Fingerprint: 5537 8473 CD26 507E 8EE2  BAAF 98E2 FD16 CEB9 826F
__________________________________________