[OpenAFS-devel] Problems with 1.4.8pre2 on Fedora 8
Harald Barth
haba@kth.se
Fri, 10 Oct 2008 11:57:47 +0200 (CEST)
> on my machine I am seeing very similiar problems when I try to access
> many files, e.g.
Hi Alf, hi Hans, hi everyone!
I think I could reproduce the problem. Twice.
My configuration is
Client:
AFS version: OpenAFS 1.4.8pre2-pdc50 built 2008-10-09
Linux a11c31n01.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
/usr/openafs/sbin/afsd -nosettime -stat 16000 -dcache 8000 -daemons 16 -volumes 256 -rxpck 2000 -files 50000 -afsdb
# /usr/openafs/bin/cmdebug localhost -cache
Chunk files: 50000
Stat caches: 16000
Data caches: 8000
Volume caches: 256
Chunk size: 1048576
Cache size: 1828000 kB
Set time: no
Cache type: disk
Server:
AFS version: OpenAFS 1.4.7-pdc48 built 2008-07-30
Linux trevally.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:
/usr/openafs/libexec/openafs/fileserver -nojumbo -p 128 -busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 200000 -b 480 -vc 2400
patched with RX_MAX_FRAG=1
I am running a program called sob to write many and/or big files.
Sob fills the files with random junk.
/afs/pdc.kth.se/home/p/pek/public_html/sob/sob.c
$ ./sob -n 30000 -o 1000 -s 1k -b 1k -w
Writing 30000 files of size 0.001MB, blocksize 1kB
Wrote 29.297 MB in 32.070 s for 0.914 MB/s, 30000 files
$ grep -r foooosoososs .
$ ./sob -n 30000 -o 1000 -s 1k -b 1k -r
Reading 30000 files of size 0.001MB, blocksize 1kB
Failed to open file testfile.925 : Connection timed out
Nothing in FileLog on the server, nothing in /var/log/messages on the client.
OK. Let's try again:
$ ./sob -n 30000 -o 1000 -s 1k -b 1k -r
Reading 30000 files of size 0.001MB, blocksize 1kB
Failed to open file testfile.2531 : Connection timed out
Ok. Start tcpdump....
$ ./sob -n 30000 -o 1000 -s 1k -b 1k -r
Reading 30000 files of size 0.001MB, blocksize 1kB
Failed to open file testfile.3863 : Connection timed out
Stop tcpdump.
$ /usr/openafs/bin/fs getfid dir.3/testfile.3863
File dir.3/testfile.3863 (537095594.13532.6800) contained in volume 537095594
But I can't find that fid in the dump. The dump is here:
/afs/pdc.kth.se/home/h/haba/Public/rx.1223629511
Then I'm running against a 1.4.8pre2 fileserver and it bugs out real fast.
Restarted AFS client on client (same as above).
Server:
AFS version: OpenAFS 1.4.8pre2-pdc49 built 2008-10-08
Linux scad.pdc.kth.se 2.6.18-92.1.1.el5.centos.plus #1 SMP Fri Jun 20 18:05:02 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux
/usr/openafs/libexec/openafs/fileserver -nojumbo -p 128 -busyat 1200 -rxpck 800 -s 2400 -l 2400 -cb 200000 -b 480 -vc 2400
RX_MAX_FRAG=1
$ ./sob -n 30000 -o 1000 -s 1k -b 1k -w
Writing 30000 files of size 0.001MB, blocksize 1kB
Failed to create file testfile.5572 : Connection timed out
[Exit 1 ]
$ /usr/openafs/bin/fs getfid dir.5/testfile.5572
File dir.5/testfile.5572 (537095587.16946.8506) contained in volume 537095587
Complete rx dump is here:
/afs/pdc.kth.se/home/h/haba/Public/rx.1223631778
The last thing I see in the tcpdump is the storedata of the fid
_before_ (16944) the missing one. I suspect that neither store-status
nor store-data for dir.5/testfile.5572 (537095587.16946.8506) are ever
sent.
(The not answering vldb 130.237.237.230 at the end of the dump is
"correct". Error in the vldb for foreign cell)
Harald.