[OpenAFS] Problems with NAT & extremely slow fileserver

giovanni bracco bracco@frascati.enea.it
Tue, 8 Feb 2005 16:45:57 +0100


In my institution we run an AFS cell where some of the fileservers are OpenAFS  
and others (most of them)  are Transarc AFS
Every now and then ( once a month or less ) it happens  that one of our 
fileservers becomes very slow and using
rxdebug $servername 7000 -rxstats 
it can be seen that the server has 9 connections to the SAME client which 
blocks the activity:

Tue Feb  8 14:33:22 NFT 2005 waiting_for_process wp=00009_res=01287_ig=25802
1 192.107.51.29 Port=1434_id=8bb6c9ac/8162d80_R=2288_S=28124
2 192.107.51.29 Port=1434_id=8bb6c9ac/8162d84_R=2288_S=28124
3 192.107.51.29 Port=1434_id=8bb6c9ac/8162d88_R=2288_S=28124
4 192.107.51.29 Port=1434_id=8bb6c9ac/8162d90_R=2288_S=28124
5 192.107.51.29 Port=1434_id=8bb6c9ac/8162d94_R=2288_S=28124
6 192.107.51.29 Port=1434_id=8bb6c9ac/8162d98_R=2288_S=28124
7 192.107.51.29 Port=1434_id=8bb6c9ac/8162da0_R=2288_S=28124
8 192.107.51.29 Port=1434_id=8bb6c9ac/8162da4_R=2288_S=28124
9 192.107.51.29 Port=1434_id=8bb6c9ac/8162da8_R=2288_S=28124

The client usually is an OpenAFS WIndows Client behind NAT.
(it happens also with recent 1.3.x versions)

We observe it for sure on Transarc AFS fileserver. Today case is a Solaris 
with Transarc AFS 3.6 2.32.

The only way to end the problem is to disconnect completely the client.
If the file server is just restarted using bos, the problem arises again in a 
short time.

When the problem arises the following messages are found (3-4 times each 
minute) in the FileLog:
..
Tue Feb  8 07:57:36 2005 CB: RCallBackConnectBack failed for c06b331d.1434
Tue Feb  8 07:58:32 2005 CB: Call back connect back failed (in break delayed) 
for c06b331d.1434
Tue Feb  8 07:58:32 2005 BreakDelayedCallbacks FAILED for host c06b331d which 
IS UP.  Possible network or routing failure.
...

where c06b331d.1434 is the same address as the one obtained from rxdebug, 
192.107.51.29

Looking on the web using the keyword BreakDelayedCallbacks I have found a 2001 
posting: 
https://lists.openafs.org/pipermail/openafs-devel/2001-March/005683.html
which seems connected with the "BreakDelayedCallbacks"  error message and 
suggesting a patch for OpenAFS.

Actually I have tried to describe the problem, but I do not understand why it 
arises seldomly and  only with NAT clients.

The question:

has this kind of problem  been solved in the current version of OpenAFS and 
the solution is to migrate to OpenAFS all our file server?


Any suggestion or explanation is well accepted!


Giovanni




-- 
Giovanni Bracco
ENEA INFO 
(Servizio Informatica e Reti)
Via E. Fermi 45
I-00044 Frascati (Roma) Italy
phone 00-39-06-9400-5597
FAX   00-39-06-9400-5735
E-mail  bracco@frascati.enea.it
WWW http://fusfis.frascati.enea.it/~bracco