[OpenAFS] TSM client for OpenAFS
Thu, 02 Apr 2009 21:23:48 +0200
Harald Barth wrote:
>> we are currently deploying OpenAFS here at the university and also are stuck with TSM
>> for backups.
> There could be worse things. PDC uses TSM and we use homwgrown logic
> to do full and incremental dumps. These then are piped into a program
> that uses the API. The only problem is that we do not have a tool for
> the users to request restores. But this has not been a too pressing
> burden yet.
We currently use a similar approach (the hpc2n stuff) but want to get back
to the TSM object-handling way.
>> Currently there doesn't seems to exist any good AFS backup client for TSM,
> The company with the dinasaur? name (teradactyl.com) makes one, but
> last when I applied their pricing to my department size it was not so
> attractive any more. I addition we need TSM anyway and for TiBS I
> would have needed a seperate infrastructure in front of the tape
Teradactyl would probably have been the choice for us unless we had used
TSM for other things (oracle, exchange, ...).
>> and just storing volume dumps is not too appealing, both due to backup storage space and
>> simpleness in restoring single files.
> Why is the space bigger? Don't you do incremental dumps?
Of course, but it will still take up much more space. If we do full dump
once a week
and incs each night there will be many duplicates of files changed early
after the full dump :-)
>> So I have written a client that uses the TSM API for backup.
> I suspected some ongoing work for AFS backups (considering your
> earlier questions).
I haven't got any answer, so I think my assumptions were correct :-)
>> It reads the data directly out
>> of a volume and store all files in TSM as objects, while preserving ACLs, mountpoints etc.
>> Doing it this way will let AFS backups use the policies for objects, and also restores can
>> be performed via dsmc if necessary.
>> There are a few nits, though, that I haven't found a good way to handle. Any suggestions
>> are welcome :-)
>> First is the storage of AFS data inside of TSM. TSM has three identifiers for an object:
>> - filespace (typically mount point)
>> - High-level name (path inside mount point)
>> - Low-level name (filename)
>> Ideally the filespace should be the volume name, but TSM gets _really_ slow if there are
>> too many (a few hundred) filespaces. Currently I just give it the cellname, and stores
>> the volume name in the HL name (like /volume/path-in-volume). Other ideas?
> Good that you have tested the filespace == volume thing. But the
> question "what happens when you have some hundred filespaces" could be
> a relevant one to the TSM 6 devel team. (TSM 6 is a major redesign I
> was told).
We will probably test that in a not too distant future ourselves, even
if we won't
change version soon. TSM 6 uses DB2 I think for keeping track of the
> Another numbers question: We have currently at least 98434720 files in
> our cell. I don't know how TSM would react with that in one filespace
I know: It dislikes it :-) We have tested that too. Around 2.5 million
files in one
filespace seems reasonable, more than that will get a little bit slower.
Therefore I added the ability to have more filespaces that can be easily
handled. For example, the volumes "user.[a-e]*" can be put in a filespace
called /ltu.se/user.a-e. We ran into this problem ourselves.
>> Second is the storage of attributes and ACLs. There is a 255-byte space available for
>> storing object attributes connected to each object. This is not enough to store the ACLs
>> as clear-text, so I have to do pts lookups to translate them to their internal numbers
>> and store as such in the attribute block. Any better ideas of how to do this?
> Another question is when to backup the contents of a directory and
> what to store along with the directory and what along with the file.
> Plan for future file ACLs?
Directory contents are given by the TSM design. But for the attributes
I just store
the basic stuff and keep a version number, that can be incremented later.