[OpenAFS] OpenAFS Salvage Times -- test results

Mon, 10 May 2004 11:27:35 -0600

Hello all,

I've twice tested salvage times for OpenAFS/Redhat 9/ext3 as
part of "proof of concept" testing for our transition from
IBM AFS to OpenAFS.

I've twice encountered surprise at the results, which I post
below.  (The results, not the surprise.)

Given that ext3 is a journaled file system I'm surprised at
the surprise, so also describe testing procedure to make
sure I've not left out something critical.

I'll also mention here that when I was teaching both AFS and
DFS system administration for Transarc we routinely
configured DFS on journaled file systems and routinely found
that salvaging was very quick relative to AFS' vnode
file system salvage.  AFS salvage was on the order of tens of
minutes, DFS salvage was on the order of tens of seconds.

That being said, here's what I did and what I found.

For 5000 to 10000 volumes, > 300 GB, RHEL 3, OpenAFS 1.2.11,
vicep on ext3 I consistently get sub five minute salvage times,
with kill -9, bos salvage -force ... etc.

How I forced a salvage (different test runs):
   kill -9 <PID of fileserver>
   bos salvage -all
   bos salvage -force

Additional info:

IBM AFS salvage times (vnode interface) are proportional to
the number of bytes to be salvaged.

DFS salvage times are proportional to the number of pending
write operations at the time of the fileserver crash.
(Generally true of journaled file systems.)

OpenAFS w/vicep's on journaled file systems -- should be
relatively insensitive to number of bytes and relatively
sensitive to the number of active write operations at time
of file system crash.

For one test I therefore wrote a loop that forked lots of
small writes to lots of different volumes, and then killed
the fileserver.

RESULTS

TEST1

  Number of volumes:  5010

  Volume sizes:  413 volumes at 137399 K, 1 at 1863880 K, 1
  at   396644 K, 1 at 125530 K, the rest at 2 K (empty)

  Volume distribution:  5004 RW, 2 BK, 0 RO

  Total vicep space in use: (output of df, in Mbytes)
  ~ 55 GB

     Cause of salvage:  kill -9 <fileserverPID>
     Number of write operations running: 0
     Time to salvage 5010 volumes as above:  4m51s

TEST2
   Same volume set as test 1.

     Cause of salvage:  bos salvage -all
     Number of write operations running: 0
     Time to salvage 5010 volumes as above:  1m27s

     SalvageLog:  5001 "old status" messages for 5001 Vnodes
     for volume FILELOGTEST

         04/07/2004 12:07:15 FILELOGTEST (536971008) updated
         04/07/2004 09:49

         04/07/2004 12:07:16 Vnode 10036: version < inode
         version; fixed (old status)

     Time to salvage:  1m27s

TEST3
  Number of volumes:  10007

  Volume sizes:

      1 110387 K
      1 114483 K
      1 119167 K
      1 122675 K
      1 123595 K
      1 130867 K
   4715 137399 K  (~ 50% BK vols, actual size of BK ~ few K)
      1 1864318 K (BK volume, size erroneous)
      1 1864359 K
   5279 2 K       (~ 50% BK vols)
      1 64303 K
      1 65332 K
      2 9 K
      1 94327 K

  Volume distribution:  5007 RW, 5003 BK, 0 RO

  Total vicep space in use: (output of df, in Kbytes)
  332139504 K   (~ 330 GB)

   Cause of salvage:  kill -9 <fileserverPID>
     Number of write operations running: 0
     Time to salvage 10007 volumes as above:  2m35s

TEST4
   Number of volumes:  10006

   Volume sizes:

       1 110387 K
       1 114483 K
       1 119167 K
       1 122675 K
       1 123595 K
       1 130867 K
    4715 137399 K   (~ 50% BK vols, size erroneous)
       1 1864318 K  (BK volume, size erroneous)
       1 1864359 K
    5279 2 K        (~ 50% BK vols, size erroneous)
       1 64303 K
       1 65332 K
       2 9 K
       1 94327 K

  Volume distribution:  5003 RW, 5003 BK, 0 RO

  Total vicep space in use: (output of df, in Mbytes)
  324353 M   (~ 320 GB)

  Cause of salvage:  bos salvage -force
  Number of write operations running: 0
  Time to salvage 10007 volumes as above:  2m18s

TEST5
  Number of volumes: 10006

  Volume sizes:

    1 110387 K
    1 114483 K
    1 119167 K
    1 122675 K
    1 123595 K
    1 130867 K
    3460 137399 K
    1255 137400 K
    1 1864318 K
    1 1864363 K
    5279 2 K
    1 64303 K
    1 65332 K
    2 9 K
    1 94327 K

  Volume distribution:  5003 RW, 5003 BK, 0 RO

  Total vicep space in use: (output of df, in Mbytes)
    324363 M   (~ 320 GB)

  Cause of salvage:  kill -9 <PID of fileserver>
  Number of write operations running: 617
  Time to salvage 10007 volumes as above:  3m44s

  Write operations:  wrote script to

     echo " Holy xprocs, Batman." >$x/procfile  &

  where $x is incremented with each echo and is a mount point.

  At the time I killed the fileserver, 617 "echo" statements were running,
each of which was waiting to write into a different volume.

  3m44s seems pretty good.

  SUMMARY

  I'm not surprised by the salvage times given the use of a journaling file
system.

  Should I be?

  Anyone with salvage-time test results please comment.

  Thanks.

=================================
Kim (Dexter) Kimball