[OpenAFS] AFS design question: implementing AFS over a
highly-distributed, low-bandwidth network
Chaz Chandler
clc31@inbox.com
Thu, 15 Jan 2009 18:03:56 -0800
>> The issue I am running up against is how to organize the AFS volume
>> structure so that things like user home dirs, collaboration/group dirs,
>> and writable system items (like a Windows profile, for instance) are
>> optimally maintained in AFS.
>>=20
>> The set-up is:
>> 1) Each site has an OpenAFS server (each with vl, pt, bu, vol,
>> fileservers & salvager). Currently, v1.4.8 on Linux 2.6.
>=20
> You don't say so, but I'm assuming a single cell for the entire
> infrastructure. Is that correct? Also, how many sites do you have,
> and how often to you expect to grow/shrink the number of sites?
Quite right: one cell. Four sites, unlikely to grow, more likely to shrink.
> Furthermore, you don't specify your Kerberos infrastructure -- it
> would be helpful to understand where that is placed, if you have
> replicas in place, etc.
One realm, exclusively V5, realmname =3D cellname. One site has the =
master KDC, the others have slave KDCs. KDCs are the native Heimdal on =
obsd.
>> 3) All sites are connected in a full mesh VPN (max of about 30KB/s for
>> each link)
>=20
> If your max is 30KB/s, what is your expected average and minimum, as
> well as your expected latency?
Expected average: I've been seeing 20-25KB/s sustained. The round-trip =
ping time through the VPN (as a rough hack at latency) is 75-100ms.
> Even if you have 30KB between sites, my first suggestion would be to
> consider running multiple cells. Putting the ubik-based servers in
> each site (i.e., ptserver, vlserver, buserver) and attempting to run a
> single cell across all sites would be very challenging, even ignoring
> actual data access issues. As Anne points out, quorum issues across
> slow links can be difficult to deal with.
Yes, I've observed problems like she described.
> Something that might work out a little better is the work being done
> on disconnected operation. That might suffice for some of your use
> cases (assuming the timing of that finishing and the features it will
> offer is suitable for your needs).
Is disconnected afs functional? I saw it as a GSoC '08 project but the =
afsbpw08 presentation said it's only R/O.
>> I'm seeking recommendations on:
>> 1) How others have set up a regular release schedule to keep a large
>> amount of data synced over a slow network (custom scripts, I assume, but
>> is there a repository of these things and what are the general mechanics
>> and best practices here?)
>=20
> I do not know of any set of documented best practices or scripts,
> although you should be aware of
>=20
> - Morgan Stanley's VMS (Volume Management System)
This appears to be internal-only, although I see there was an afsbpw =
presentation. Looks interesting, though. =20
> - Russ Albery's volume management utilities
Okay, thanks, I'll take a look at these.
> Based on your description, you might consider having each site be a
> separate cell, and then use incremental dump and restore across cells
> for certain cases. That would remove ubik traffic from the
> site-to-site links and free up the links for remote RW access, with
> dumps and restores being done during off hours. More details on the
> dump/restore idea below.
The complexity level is beginning to rise. :)
> If your users were tech-savvy (e.g., developers), I'd also think
> seriously about using a Version Control System instead of a networked
> filesystem for this part of the problem.
Alas....
> Users in different sites wanting to read and write 10's of MBs of data
> over 30K links simply may not be realistic given current architectures
> and implementations.
>=20
> More study of this case should be done: it's the real hard one.
>=20
>> - A user dir: large amounts of data updated from a single location, but
>> user may move to any other site at any time, potentially with up to a
>> day of transit time in which a volume could be moved to the destination
>> site.
>=20
> I would consider building a system that would let me have an offline
> copy of the user volumes in each location, and synchronize them on
> some regular basis, depending on usage patterns. You could then also
> provide a utility like 'move to site X' that the users could run which
> would find the current location of that home directory, take it
> offline, do an incremental dump & restore, then bring the new volume
> online.
It sounds like this would be a very complex option. More complex than =
just moving an R/W volume even, maybe? Would this offline copy be outside =
of afs, and then synced with it? Sounds rough=21
> An alternative to that would be disconnected operations: since I'm
> guessing that your users will need their own data frequently, but
> seldom will they need each others, it might work out that your users
> can put their home volumes into the cache on their local system (this
> would work best if the users had laptops that they carry from site to
> site, but would not work so well if there are fixed systems at each
> site that they use), and then you could engineer something so that
> when they re-connect to the network, automatically sync the volume
> from their local system to the local site, updating the various
> databases behind the scenes.
Yikes.
> That assumes development work, however. And I don't know if that
> would meet your requirements.
I don't mind writing scripts, I'm just a little surprised that this hasn't =
been done before=21 Luckily, I am not bound to any particular setup. So, =
I'm open to fairly drastic changes if necessary. The afs implementation =
has been an experiment and has become fairly involved, but I'm becoming =
more familiar with it all.