[OpenAFS] Decentralized failover/backup system for RW volumes

Prunk Dump prunkdump@gmail.com
Wed, 4 Apr 2018 10:10:32 +0200


Hi OpenAFS Team !

I'm currently administering a high-school networks with 5 Samba PDC
and around 150 Linux et 300 Windows clients. To build my user's shares
I use simultaneously Samba DFS and NFSv4 ( with referrals ). So I have
a global namespace for my windows and Linux clients but I need to
manages all my volumes manually to distribute the load on the servers
and making redundancy with rsync.

I will be shortly upgrading all my servers. So I have started
investigating on new solutions. And AFS seems to fit nearly all my
needs ! Just a point is still problematic.

I try to keep all my services decentralized. At that times if one of
my 5 DC fail, I can restore all the services/files with the 4
remaining DC because :
-> All the service's database are synced between DCs
-> All my user's home/profiles files are present on at least two DCs
(but accessed on only one and synced by night)

Is there a way to implement this with AFS ? Reading the documentation
I think about two possible scenario :

1) I add a site for all my user's home/profile volumes and I mount the
RW volume with a RW mount point on the AFS namespace. During the night
I release all there volumes. If one server fail. On can restore the
lost RW volumes with one of the corresponding RO version.

The problem is that seems to not follow the AFS design where
replication is primary for high availability RO volumes. The AFS
documentation suggest to only use RW mount points for ".domain.com".
And I don't know, but maybe the night "release" operation will be to
slow as it take around 5 seconds per volumes ans I have around 1100
users (so 2200 home/profiles volumes).

2) For each user's home/profile volume I can create a
"volume.custombackup" volume placed on another server. During the
night I dump the original volumes, compress the output, and write it
to a file inside the corresponding "custombackup" volume.

But this solution is far more complex:
- The association between the volumes and their "custombackup" version
is not in the "VLDB" database. I need to maintain this database
myself.
- Making a full dump of all the volumes may be too slow. So I need to
implement incremental dumps.
- Restoring dumps my be complex. Moreover if I user incremental dumps.
- I need to generate the keytab to let my cron jobs access to the AFS
filesystem.

If someone can help me to make a design like this. Maybe I forgot
something as a just started reading the documentation this week.

Thanks !

Baptiste.