[OpenAFS-devel] Problems with 1.4.8pre2 on Fedora 8

Harald Barth haba@kth.se
Fri, 10 Oct 2008 11:57:47 +0200 (CEST)


> on my machine I am seeing very similiar problems when I try to access
> many files, e.g.

Hi Alf, hi Hans, hi everyone!

I think I could reproduce the problem. Twice.

My configuration is

Client:
AFS version:  OpenAFS 1.4.8pre2-pdc50 built  2008-10-09 
Linux a11c31n01.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
/usr/openafs/sbin/afsd -nosettime -stat 16000 -dcache 8000 -daemons 16 -volumes 256 -rxpck 2000 -files 50000 -afsdb
# /usr/openafs/bin/cmdebug localhost -cache
Chunk files:   50000
Stat caches:   16000
Data caches:   8000
Volume caches: 256
Chunk size:    1048576
Cache size:    1828000 kB
Set time:      no
Cache type:    disk

Server:
AFS version:  OpenAFS 1.4.7-pdc48 built  2008-07-30 
Linux trevally.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:
/usr/openafs/libexec/openafs/fileserver -nojumbo -p 128 -busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 200000 -b 480 -vc 2400
patched with RX_MAX_FRAG=1


I am running a program called sob to write many and/or big files.
Sob fills the files with random junk.
/afs/pdc.kth.se/home/p/pek/public_html/sob/sob.c

$ ./sob -n 30000 -o 1000 -s 1k -b 1k -w
Writing 30000 files of size 0.001MB, blocksize 1kB
Wrote 29.297 MB in 32.070 s for 0.914 MB/s, 30000 files

$ grep -r foooosoososs .

$ ./sob -n 30000 -o 1000 -s 1k -b 1k -r
Reading 30000 files of size 0.001MB, blocksize 1kB
Failed to open file testfile.925 : Connection timed out

Nothing in FileLog on the server, nothing in /var/log/messages on the client.

OK. Let's try again:

$ ./sob -n 30000 -o 1000 -s 1k -b 1k -r
Reading 30000 files of size 0.001MB, blocksize 1kB
Failed to open file testfile.2531 : Connection timed out

Ok. Start tcpdump....

$ ./sob -n 30000 -o 1000 -s 1k -b 1k -r
Reading 30000 files of size 0.001MB, blocksize 1kB
Failed to open file testfile.3863 : Connection timed out

Stop tcpdump.

$ /usr/openafs/bin/fs getfid dir.3/testfile.3863
File dir.3/testfile.3863 (537095594.13532.6800) contained in volume 537095594

But I can't find that fid in the dump. The dump is here:
/afs/pdc.kth.se/home/h/haba/Public/rx.1223629511


Then I'm running against a 1.4.8pre2 fileserver and it bugs out real fast.
Restarted AFS client on client (same as above).

Server:
AFS version:  OpenAFS 1.4.8pre2-pdc49 built  2008-10-08 
Linux scad.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
/usr/openafs/libexec/openafs/fileserver -nojumbo -p 128 -busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 200000 -b 480 -vc 2400
RX_MAX_FRAG=1

$ ./sob -n 30000 -o 1000 -s 1k -b 1k -w
Writing 30000 files of size 0.001MB, blocksize 1kB
Failed to create file testfile.5572 : Connection timed out
[Exit 1 ]
$ /usr/openafs/bin/fs getfid dir.5/testfile.5572 
File dir.5/testfile.5572 (537095587.16946.8506) contained in volume 537095587

Complete rx dump is here:
/afs/pdc.kth.se/home/h/haba/Public/rx.1223631778

The last thing I see in the tcpdump is the storedata of the fid
_before_ (16944) the missing one. I suspect that neither store-status
nor store-data for dir.5/testfile.5572 (537095587.16946.8506) are ever
sent.

(The not answering vldb 130.237.237.230 at the end of the dump is
"correct". Error in the vldb for foreign cell)

Harald.