[OpenAFS] klog really slow (Fedora Core Linux, kernel-2.6.14-1.1656_FC4)

Sergio Gelato Sergio.Gelato@astro.su.se
Fri, 20 Jan 2006 22:41:02 +0100

* Paul Johnson [2006-01-19 23:21:01 -0600]:
> When I type
> $ klog pauljohn
> the system waits for between 40 and 50 seconds. THere are no errors,
> and eventually the klog is approved.  The connection is good and I can
> move files in and out of /afs/ku.edu, our cell.
> How to debug?  Is there some program I could get to monitor what's
> going on while klog is working?

Like Derek Atkins, I'd suspect that the delay is due to timeouts in
looking up the address(es) of the authentication server(s). But he
didn't directly answer your question: yes, there are programs that
can help you monitor this.

One is strace: it will show you the system calls performed by klog,
including DNS lookups and communication with the authentication server.
The -t option to strace will add timestamps. Try
	strace -t -o klog.strace klog pauljohn
then look at the contents of klog.strace.

>  pols110 kernel: afs: Lost contact with volume location server:

That sounds like packets are being lost somewhere along the way.
If it happens often enough, it may be worthwhile to capture volserver
traffic with tcpdump (or another packet sniffer).
	tcpdump -s 1500 -w volserver.dump udp port 7005
(as root) will copy the packets to volserver.dump, and after an incident 
you can run "tcpdump -r volserver.dump" with appropriate filtering and
display options to see what happened.

> Warning: failed to find address of system call table
> System call hooks will not be installed; proceeding anyway

I don't think this can account for loss of connectivity with the server.

> Starting AFS cache scan...<6>tg3: eth0: Link is up at 10 Mbps, half duplex.

That's a slow network link by today's standards, and half duplex can
lead to a higher rate of packet loss. Is your computer connected to an
old hub? How about replacing that with even a cheap 100 Mb/s switch?