[OpenAFS] Resilience

anne salemme anne.salemme@dartmouth.edu
Tue, 02 Jun 2009 08:27:03 -0400

if you're aiming for 100% "guaranteed" availability for those RW 
volumes, a few other considerations:
    - make the volumes as small as practical, to keep the 'vos' 
operations short
    - make your afs database servers equally robust
    - make sure your network people provide the same level of robustness 
in the routers and stuff that the afs servers rely on

afs is awesome, but it depends on the underlying network and power 
(something you might forget until you can't forget...)


Wheeler, JF (Jonathan) wrote:
> One of our (3) AFS servers has a mounted read-write volume which must be
> available 24x7 to our batch system.  The server is as resilient is we
> can make it, but still it may fail outside normal working hours for some
> reason.  For technical reasons related to the software installed on the
> volume it is not possible to use read-only volumes mounted from our
> other servers (the software must be installed and served from the same
> directory name), so I have devised the following plan in the event of a
> failure: 
> a) create read-only volumes on the other 2 servers, but do not mount
> them; use "vos release" whenever the software is updated
> b) in the event of a failure of server1 (which has the rw volume), drop
> the existing mount and mount one of the read-only volumes (we can live
> with the read-only copy whilst server1 is being repaired/replaced) in
> its place.
> Can anyone see problems with that scenario ?  We could use "vos
> convertROtoRW"; how would that affect the process ?
> Jonathan Wheeler 
> e-Science Centre 
> Rutherford Appleton Laboratory