[OpenAFS] server problems (was: client problems)

Martin Schulz schulz@iwrmm.math.uni-karlsruhe.de
17 Apr 2001 17:17:41 +0200


last week, I reported on problems installing openafs on linux. Though
it seems to work ok, I experience serious delays when writing to the
afs tree. 

(If you're in a hurry, skip this) --------------------------------------
I searched around in the source and found that the function in
question is in afs_dcache.c. Blowing up that function with additional
afs_Trace macros, I could show that the time is consumed soley in the 

code = rx_Write(acall, tbuffer, got);

statement, which in turn is a macro that expand into rx_WriteProc
whose definition can be found in rx_rdwr.c. The first 2 blocks of code
apparently only work on data structures in local main memory; since
both client and server are basically idle, this is not suspect.

The real work seems to be done in the part

    bytes = rxi_WriteProc(call, buf, nbytes);

for Linux. MUTEX_ENTER, MUTEX_EXIT are defined in the SMP case,

otherwiese no-op. Since the problem occurs in both smp and
non-smp-setups, these too are not suspect.

It boils down to 

 bytes = rxi_WriteProc(call, buf, nbytes);

This "internal version" has several "wait"s and "sleep"s in it, which
probably cause the delays I am encountering. 

Im am stuck here, since I have problems to pump that function up with
debugging output while still beeing able to compile. 

However, a helpful soul (thanks to Forrest D Whitcher) gave me the
possibility to test my client against his server, and with his server,
the writes were not delayed; the only changed were the CellServDB and
the ThisCell files. 

Therefore, I now look for the reason for those delays on the
server. These delays are all multiples of 10 seconds, and are
dependent of the size of the file to be written: 

For files smaller then 10kB, I could not observe any delays, 
For files of >10Kb I could sometimes observe 10s delay, sometimes not.
For files of ~1000Kb, I always observe the delay.

To exclude possible nameserver problems, I have deleted the
namerserver entries in /etc/resolv.conf and made sure that the hosts
in question are in /etc/hosts. 

Where might this problem occur? At which location are 10sec intervall
used? Any ideas? 

Martin Schulz                             schulz@iwrmm.math.uni-karlsruhe.de
Uni Karlsruhe, Institut f. wissenschaftliches Rechnen u. math. Modellbildung
Engesser Str. 6, 76128 Karlsruhe