[OpenAFS] OpenAFS Salvage Times -- test results
Dexter 'Kim' Kimball
dhk@ccre.com
Mon, 10 May 2004 11:27:35 -0600
Hello all,
I've twice tested salvage times for OpenAFS/Redhat 9/ext3 as
part of "proof of concept" testing for our transition from
IBM AFS to OpenAFS.
I've twice encountered surprise at the results, which I post
below. (The results, not the surprise.)
Given that ext3 is a journaled file system I'm surprised at
the surprise, so also describe testing procedure to make
sure I've not left out something critical.
I'll also mention here that when I was teaching both AFS and
DFS system administration for Transarc we routinely
configured DFS on journaled file systems and routinely found
that salvaging was very quick relative to AFS' vnode
file system salvage. AFS salvage was on the order of tens of
minutes, DFS salvage was on the order of tens of seconds.
That being said, here's what I did and what I found.
For 5000 to 10000 volumes, > 300 GB, RHEL 3, OpenAFS 1.2.11,
vicep on ext3 I consistently get sub five minute salvage times,
with kill -9, bos salvage -force ... etc.
How I forced a salvage (different test runs):
kill -9 <PID of fileserver>
bos salvage -all
bos salvage -force
Additional info:
IBM AFS salvage times (vnode interface) are proportional to
the number of bytes to be salvaged.
DFS salvage times are proportional to the number of pending
write operations at the time of the fileserver crash.
(Generally true of journaled file systems.)
OpenAFS w/vicep's on journaled file systems -- should be
relatively insensitive to number of bytes and relatively
sensitive to the number of active write operations at time
of file system crash.
For one test I therefore wrote a loop that forked lots of
small writes to lots of different volumes, and then killed
the fileserver.
RESULTS
TEST1
Number of volumes: 5010
Volume sizes: 413 volumes at 137399 K, 1 at 1863880 K, 1
at 396644 K, 1 at 125530 K, the rest at 2 K (empty)
Volume distribution: 5004 RW, 2 BK, 0 RO
Total vicep space in use: (output of df, in Mbytes)
~ 55 GB
Cause of salvage: kill -9 <fileserverPID>
Number of write operations running: 0
Time to salvage 5010 volumes as above: 4m51s
TEST2
Same volume set as test 1.
Cause of salvage: bos salvage -all
Number of write operations running: 0
Time to salvage 5010 volumes as above: 1m27s
SalvageLog: 5001 "old status" messages for 5001 Vnodes
for volume FILELOGTEST
04/07/2004 12:07:15 FILELOGTEST (536971008) updated
04/07/2004 09:49
04/07/2004 12:07:16 Vnode 10036: version < inode
version; fixed (old status)
Time to salvage: 1m27s
TEST3
Number of volumes: 10007
Volume sizes:
1 110387 K
1 114483 K
1 119167 K
1 122675 K
1 123595 K
1 130867 K
4715 137399 K (~ 50% BK vols, actual size of BK ~ few K)
1 1864318 K (BK volume, size erroneous)
1 1864359 K
5279 2 K (~ 50% BK vols)
1 64303 K
1 65332 K
2 9 K
1 94327 K
Volume distribution: 5007 RW, 5003 BK, 0 RO
Total vicep space in use: (output of df, in Kbytes)
332139504 K (~ 330 GB)
Cause of salvage: kill -9 <fileserverPID>
Number of write operations running: 0
Time to salvage 10007 volumes as above: 2m35s
TEST4
Number of volumes: 10006
Volume sizes:
1 110387 K
1 114483 K
1 119167 K
1 122675 K
1 123595 K
1 130867 K
4715 137399 K (~ 50% BK vols, size erroneous)
1 1864318 K (BK volume, size erroneous)
1 1864359 K
5279 2 K (~ 50% BK vols, size erroneous)
1 64303 K
1 65332 K
2 9 K
1 94327 K
Volume distribution: 5003 RW, 5003 BK, 0 RO
Total vicep space in use: (output of df, in Mbytes)
324353 M (~ 320 GB)
Cause of salvage: bos salvage -force
Number of write operations running: 0
Time to salvage 10007 volumes as above: 2m18s
TEST5
Number of volumes: 10006
Volume sizes:
1 110387 K
1 114483 K
1 119167 K
1 122675 K
1 123595 K
1 130867 K
3460 137399 K
1255 137400 K
1 1864318 K
1 1864363 K
5279 2 K
1 64303 K
1 65332 K
2 9 K
1 94327 K
Volume distribution: 5003 RW, 5003 BK, 0 RO
Total vicep space in use: (output of df, in Mbytes)
324363 M (~ 320 GB)
Cause of salvage: kill -9 <PID of fileserver>
Number of write operations running: 617
Time to salvage 10007 volumes as above: 3m44s
Write operations: wrote script to
echo " Holy xprocs, Batman." >$x/procfile &
where $x is incremented with each echo and is a mount point.
At the time I killed the fileserver, 617 "echo" statements were running,
each of which was waiting to write into a different volume.
3m44s seems pretty good.
SUMMARY
I'm not surprised by the salvage times given the use of a journaling file
system.
Should I be?
Anyone with salvage-time test results please comment.
Thanks.
=================================
Kim (Dexter) Kimball