[OpenAFS] TSM client for OpenAFS
Fri, 03 Apr 2009 11:49:25 +0200
James E. Blair wrote:
> Anders Magnusson <email@example.com> writes:
>> we are currently deploying OpenAFS here at the university and also
>> are stuck with TSM for backups. Currently there doesn't seems to
>> exist any good AFS backup client for TSM, and just storing volume
>> dumps is not too appealing, both due to backup storage space and
>> simpleness in restoring single files.
>> So I have written a client that uses the TSM API for backup. It
>> reads the data directly out of a volume and store all files in TSM
>> as objects, while preserving ACLs, mountpoints etc. Doing it this
>> way will let AFS backups use the policies for objects, and also
>> restores can be performed via dsmc if necessary.
> We're in a similar situation here, and considered writing the program
> that you did, though instead we went a slightly different route. Our
> service is in a pilot phase, so we have some flexibility for
> We wrote a script that sets up a TSM environment and calls dsmc to
> perform the backups. The script takes care of looking for mountpoints
> to both record them for future restores, and add them to an exclude
> list to prevent unwanted recursion. It also writes directory ACLs to
> a metadata file.
> We backup each volume to a filespace, and the metadata (ACLs, mounts)
> of each volume to another. When we dump the metadata, we check the
> hash of the file against the last time we generated it, and skip
> backing it up if it hasn't changed.
> We actually backup the .backup volumes (ie, snapshots), so that the
> data are consistent during the backup. That way an errant recursive
> mountpoint can't sneak in and ruin our day.
> We have about 400 volumes with 1.5 terabytes, and it takes us about
> 2.5 hours to work through that. It's not relevant to AFS, but just in
> terms of how TSM scales, our email system has 10 filespaces with 1TB
> and 20 million objects each, with each backup taking about 10 hours on
I assume that how many files in a filespace that may make TSM slower may
on many things, like directory structures etc. We just noticed this on
Also, our AFS space seems to be different from yours, we will have ~70k
volumes with not too many 100M of files.
>> There are a few nits, though, that I haven't found a good way to
>> handle. Any suggestions
>> are welcome :-)
>> First is the storage of AFS data inside of TSM. TSM has three
>> identifiers for an object:
>> - filespace (typically mount point)
>> - High-level name (path inside mount point)
>> - Low-level name (filename)
>> Ideally the filespace should be the volume name, but TSM gets _really_
>> slow if there are
>> too many (a few hundred) filespaces. Currently I just give it the
>> cellname, and stores
>> the volume name in the HL name (like /volume/path-in-volume). Other ideas?
> That's interesting, is that limit per-node, or per-TSM-server?
Per node. Different nodes do not seem to interfere with each other.
>> So, if someone beside us need to use TSM for AFS and are interested
>> in using this client, feel free to give
>> comments/ideas/whatever... :-)
> This is very interesting. Our script is doing well for the moment,
> but a solid API client may be preferable in the long run.
> I thought I remembered something in the API docs indicating that you
> may not be able to use dsmc to restore something that was stored using
> the API, but you mention that you could use dsmc for restores if
> necessary. Have you tried that, and were there any issues?
The API docs says how to use dsmc to be able to extract files stored
from the API.
Actually, I have almost only used dsmc so far, the restore client is not
yet and I haven't really decided how it shall work. Currently I just
without any AFS magic, it just writes to the filesystem.
> James E. Blair
> UC Berkeley - IST
> OpenAFS-info mailing list