[OpenAFS] AFS design question: implementing AFS over a highly-distributed, low-bandwidth network

Thu, 15 Jan 2009 18:03:56 -0800

>> The issue I am running up against is how to organize the AFS volume
>> structure so that things like user home dirs, collaboration/group dirs,
>> and writable system items (like a Windows profile, for instance) are
>> optimally maintained in AFS.
>>=20
>> The set-up is:
>> 1) Each site has an OpenAFS server (each with vl, pt, bu, vol,
>> fileservers & salvager).  Currently, v1.4.8 on Linux 2.6.
>=20
> You don't say so, but I'm assuming a single cell for the entire
> infrastructure.  Is that correct?  Also, how many sites do you have,
> and how often to you expect to grow/shrink the number of sites?

Quite right: one cell.  Four sites, unlikely to grow, more likely to shrink.

> Furthermore, you don't specify your Kerberos infrastructure -- it
> would be helpful to understand where that is placed, if you have
> replicas in place, etc.

One realm, exclusively V5, realmname =3D cellname.  One site has the =
master KDC, the others have slave KDCs.  KDCs are the native Heimdal on =
obsd.

>> 3) All sites are connected in a full mesh VPN (max of about 30KB/s for
>> each link)
>=20
> If your max is 30KB/s, what is your expected average and minimum, as
> well as your expected latency?

Expected average: I've been seeing 20-25KB/s sustained.  The round-trip =
ping time through the VPN (as a rough hack at latency) is 75-100ms.

> Even if you have 30KB between sites, my first suggestion would be to
> consider running multiple cells.  Putting the ubik-based servers in
> each site (i.e., ptserver, vlserver, buserver) and attempting to run a
> single cell across all sites would be very challenging, even ignoring
> actual data access issues.  As Anne points out, quorum issues across
> slow links can be difficult to deal with.

Yes, I've observed problems like she described.

> Something that might work out a little better is the work being done
> on disconnected operation.  That might suffice for some of your use
> cases (assuming the timing of that finishing and the features it will
> offer is suitable for your needs).

Is disconnected afs functional?  I saw it as a GSoC '08 project but the =
afsbpw08 presentation said it's only R/O.

>> I'm seeking recommendations on:
>> 1) How others have set up a regular release schedule to keep a large
>> amount of data synced over a slow network (custom scripts, I assume, but
>> is there a repository of these things and what are the general mechanics
>> and best practices here?)
>=20
> I do not know of any set of documented best practices or scripts,
> although you should be aware of
>=20
> - Morgan Stanley's VMS (Volume  Management System)

This appears to be internal-only, although I see there was an afsbpw =
presentation.  Looks interesting, though. =20

> - Russ Albery's volume management utilities

Okay, thanks, I'll take a look at these.

> Based on your description, you might consider having each site be a
> separate cell, and then use incremental dump and restore across cells
> for certain cases.  That would remove ubik traffic from the
> site-to-site links and free up the links for remote RW access, with
> dumps and restores being done during off hours.  More details on the
> dump/restore idea below.

The complexity level is beginning to rise. :)

> If your users were tech-savvy (e.g., developers), I'd also think
> seriously about using a Version Control System instead of a networked
> filesystem for this part of the problem.

Alas....

> Users in different sites wanting to read and write 10's of MBs of data
> over 30K links simply may not be realistic given current architectures
> and implementations.
>=20
> More study of this case should be done: it's the real hard one.
>=20
>> - A user dir: large amounts of data updated from a single location, but
>> user may move to any other site at any time, potentially with up to a
>> day of transit time in which a volume could be moved to the destination
>> site.
>=20
> I would consider building a system that would let me have an offline
> copy of the user volumes in each location, and synchronize  them on
> some regular basis, depending on usage patterns.  You could then also
> provide a utility like 'move to site X' that the users could run which
> would find the current location of that home directory, take it
> offline, do an incremental dump & restore, then bring the new volume
> online.

It sounds like this would be a very complex option.  More complex than =
just moving an R/W volume even, maybe?  Would this offline copy be outside =
of afs, and then synced with it?  Sounds rough=21

> An alternative to that would be disconnected operations: since I'm
> guessing that your users will need their own data frequently, but
> seldom will they need each others, it might work out that your users
> can put their home volumes into the cache on their local system (this
> would work best if the users had laptops that they carry from site to
> site, but would  not work so well if there are fixed systems at each
> site that they use), and then you could engineer something so that
> when they re-connect to the network, automatically sync the volume
> from their local system to the local site, updating the various
> databases behind the scenes.

Yikes.

> That assumes development work, however.  And I don't know if that
> would meet your requirements.

I don't mind writing scripts, I'm just a little surprised that this hasn't =
been done before=21  Luckily, I am not bound to any particular setup.  So, =
I'm open to fairly drastic changes if necessary.  The afs implementation =
has been an experiment and has become fairly involved, but I'm becoming =
more familiar with it all.