[OpenAFS] TSM client for OpenAFS

Thu, 02 Apr 2009 21:23:48 +0200

Harald Barth wrote:
>> we are currently deploying OpenAFS here at the university and also are stuck with TSM
>> for backups.
>>     
>
> There could be worse things. PDC uses TSM and we use homwgrown logic
> to do full and incremental dumps. These then are piped into a program
> that uses the API. The only problem is that we do not have a tool for
> the users to request restores. But this has not been a too pressing
> burden yet.
>   
We currently use a similar approach (the hpc2n stuff) but want to get back
to the TSM object-handling way.

>> Currently there doesn't seems to exist any good AFS backup client for TSM,
>>     
>
> The company with the dinasaur? name (teradactyl.com) makes one, but
> last when I applied their pricing to my department size it was not so
> attractive any more. I addition we need TSM anyway and for TiBS I
> would have needed a seperate infrastructure in front of the tape
> library.
>   
Teradactyl would probably have been the choice for us unless we had used
TSM for other things (oracle, exchange, ...).

>> and just storing volume dumps is not too appealing, both due to backup storage space and
>> simpleness in restoring single files.
>>     
>
> Why is the space bigger? Don't you do incremental dumps?
>
>   
Of course, but it will still take up much more space. If we do full dump
once a week
and incs each night there will be many duplicates of files changed early
after the full dump :-)

>> So I have written a client that uses the TSM API for backup.  
>>     
>
> I suspected some ongoing work for AFS backups (considering your
> earlier questions).
>   
I haven't got any answer, so I think my assumptions were correct :-)

>> It reads the data directly out
>> of a volume and store all files in TSM as objects, while preserving ACLs, mountpoints etc.
>> Doing it this way will let AFS backups use the policies for objects, and also restores can
>> be performed via dsmc if necessary.
>>     
>
> Nice.
>
>   
>> There are a few nits, though, that I haven't found a good way to handle.  Any suggestions
>> are welcome :-)
>>
>> First is the storage of AFS data inside of TSM.  TSM has three identifiers for an object:
>> - filespace (typically mount point)
>> - High-level name (path inside mount point)
>> - Low-level name (filename)
>> Ideally the filespace should be the volume name, but TSM gets _really_ slow if there are
>> too many (a few hundred) filespaces.  Currently I just give it the cellname, and stores
>> the volume name in the HL name (like /volume/path-in-volume).  Other ideas?
>>     
>
> Good that you have tested the filespace == volume thing. But the
> question "what happens when you have some hundred filespaces" could be
> a relevant one to the TSM 6 devel team. (TSM 6 is a major redesign I
> was told).
>   
We will probably test that in a not too distant future ourselves, even
if we won't
change version soon.  TSM 6 uses DB2 I think for keeping track of the
objects.

> Another numbers question: We have currently at least 98434720 files in
> our cell. I don't know how TSM would react with that in one filespace
> either.
>   
I know: It dislikes it :-) We have tested that too.  Around 2.5 million
files in one
filespace seems reasonable, more than that will get a little bit slower.
Therefore I added the ability to have more filespaces that can be easily
handled.  For example, the volumes "user.[a-e]*" can be put in a filespace
called /ltu.se/user.a-e.  We ran into this problem ourselves.

>> Second is the storage of attributes and ACLs.  There is a 255-byte space available for
>> storing object attributes connected to each object.  This is not enough to store the ACLs
>> as clear-text, so I have to do pts lookups to translate them to their internal numbers
>> and store as such in the attribute block.  Any better ideas of how to do this?
>>     
>
> Another question is when to backup the contents of a directory and
> what to store along with the directory and what along with the file.
> Plan for future file ACLs?
>   
Directory contents are given by the TSM design.  But for the attributes
I just store
the basic stuff and keep a version number, that can be incremented later.

-- Ragge