[OpenAFS] Balancing usecnts of RO volumes

Garance A Drosihn drosih@rpi.edu
Sat, 17 Dec 2011 21:41:59 -0500


Hello.  A minor question here.

I was curious if there was some way to encourage clients to load-
balance their references across multiple RO volumes.  I have a
volume which is fairly large (for us), and is often referenced
about 10,000 times an hour.  I wanted to move that volume to new
fileserver partitions.

Before the move, the usage was pretty balanced between RO copies,
for example:

   VolumeID  Type Fsrvr/part UsedMB Usecnt  LastUpdate  CreateDate
  ---------- ---- ---------- ------ ------ ----------- -----------
   537373184   RW afsfs13/a    6040    281 11L26@22:08 04F07@15:34
   537373185   RO afsfs13/a    6040 246409 11L23@13:45 11L23@13:45
   537373185   RO afsfs15/a    6040 242495 11L23@13:45 11L23@13:45

(the above is output from a script I wrote.  The only column I care
about is the 'Usecnt' one).  Here are the numbers on a later date,
just minutes before I made any changes:

   VolumeID  Type Fsrvr/part UsedMB Usecnt  LastUpdate  CreateDate
  ---------- ---- ---------- ------ ------ ----------- -----------
   537373184   RW afsfs13/a    6041   6067 11L30@12:08 04F07@15:34
   537373185   RO afsfs13/a    6041  30252 11L30@11:01 11L30@11:02
   537373185   RO afsfs15/a    6041  27215 11L30@11:01 11L30@11:02

I then created a new RO volume, and released it:

   VolumeID  Type Fsrvr/part UsedMB Usecnt  LastUpdate  CreateDate
  ---------- ---- ---------- ------ ------ ----------- -----------
   537373184   RW afsfs13/a    6041   9719 11L30@13:34 04F07@15:34
   537373185   RO afsfs13/a    6041   5258 11L30@13:18 11L30@13:23
   537373185   RO afsfs15/a    6041   3675 11L30@13:18 11L30@13:23
   537373185   RO afsfs12/b    6041      0 11L30@13:18 11L30@13:23

I'm not sure it's important, but it happened that someone else had
vos-released the volume maybe 5-10 minutes before I did, without
me realizing they had until I had already done my vos-release.

Maybe 10 minutes later, I vos-removed one of the older RO's:

   VolumeID  Type Fsrvr/part UsedMB Usecnt  LastUpdate  CreateDate
  ---------- ---- ---------- ------ ------ ----------- -----------
   537373184   RW afsfs13/a    6041   9719 11L30@13:34 04F07@15:34
   537373185   RO afsfs13/a    6041   7490 11L30@13:18 11L30@13:23
   537373185   RO afsfs12/b    6041      8 11L30@13:18 11L30@13:23

I don't mind that the access usecounts are out-of-balance at that
point, but that was back on November 30th.  That volume is modified
and vos-released at least two or three times per weekday.  And here
we are on the 17th, and the RO volume on afsfs13/vicea is still
referenced much more often than the new one on afsfs12/viceb:

   VolumeID  Type Fsrvr/part UsedMB Usecnt  LastUpdate  CreateDate
  ---------- ---- ---------- ------ ------ ----------- -----------
   537373184   RW afsfs13/a    5619    494 11M17@21:11 04F07@15:34
   537373185   RO afsfs13/a    5619  57495 11M17@18:36 11M17@18:37
   537373185   RO afsfs12/b    5619    108 11M17@18:36 11M17@18:37

It's probably true that this volume is referenced from a small
number of client machines (web servers), and those machines are
rarely rebooted.  But is there some way to encourage the clients
to balance out the references?  The volume hasn't been very busy
today, but I've seen other days where the RO on afsfs13/a has been
referenced 500,000 times, and the one on afsfs12/b less than 2,000
times.

I do plan to move these specific volumes around some more over the
winter break, so the state of this volume will be changing anyway.
But it seemed like an interesting question.

-- 
Garance Alistair Drosehn                =     drosih@rpi.edu
Senior Systems Programmer               or   gad@FreeBSD.org
Rensselaer Polytechnic Institute;             Troy, NY;  USA