[AFS3-std] Compression support for AFS vos dump - Specification

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 01 Nov 2007 13:43:15 -0400



On Monday, October 29, 2007 07:42:09 PM -0400 Jeffrey Altman 
<jaltman@secure-endpoints.com> wrote:

> Peter Somogyi back almost three years ago proposed the following
> protocol enhancement for creating dump files containing compressed
> data.  An implementation of this proposal is available for review
> in OpenAFS RT Ticket 17947.
>
> Feedback is requested.


+1 on the design, as described here, but I do have a couple of comments, 
noted below.

I should note that I had a fair amount of input into this, as Peter and I 
discussed a number of problems with his original proposal, and that 
discussion led to what he has actually proposed to the community and 
implemented.

I have not yet reviewed the implementation.




> - new volserver must produce always compressed dump if the compression is
> turned on - at "vos dump"

Ideally, there should be a way for clients to signal that they want an 
uncompressed dump, even if the volserver would normally send a compressed 
dump.  This could be accomplished via a flag to AFSVolDumpV2 (which did not 
exist at the time this proposal was drafted).



> - <length>: variable size length, 0x80=unknown (means that
>   this section must be parsed to proceed)
>   - If bit 7 is not set, bits 0-6 are the length
>   - If bit 7 is set, bits 0-6 tell you how many bytes long the
>     length is.  The length follows immediately in MSB-first order.
>   - The special value 0x80 means indefinite length.
>   - 0xfe and 0xff would indicate a single-bit value, with the
>     value stored in the low-order bit of the length.

It should be noted that this describes the form of the length portion of
all TLV-form tags; that is, all currently-undefined tags from 0x05 through
0x60, inclusive.  Similarly, the rules quoted later permanently establish a 
data size of 32 bits for currently-undefined tags in the range 0x61-0x7a, 
and a data size of 0 bits for tags 0x7b-0x7f.  The data size and format of 
tags which are already defined does not change.  The goal here was to make 
it possible to write a dump parser which can process dumps containing new 
tags it does not understand.

Note that the <length> format as described here includes an indefinite form 
(0x80), which can be used when the data size is not known at the time the 
tag is emitted.  This is necessary for compressed dumps, but carries an 
important restriction -- such a tag cannot be skipped by a parser which 
does not understand its contents, because the data is self-describing and 
its end can be found only by parsing it.  This restriction could be removed 
at the expense of some overhead, by chunking indefinite-length data, with 
each chunk prefixed by a length in a standard form.  I don't believe such a 
change is necessary for compressed volume dumps, since the parser would 
have to understand compressed data anyway in order to do anything with the 
dump.

> - "don't compress" flag must be transmitted only when the source
> volserver's compression is set, and each receiver volserver supports this
> flag (in case of "vos release"), or when "vos dump" was invoked

It appears that this proposal transmits the "don't compress" flag as part 
of the compressed volume dump, which is silly -- if the flag is set, a 
compressed dump will never be sent, and if a compressed dump is not set, 
you lose the flag information.

Unfortunately, we are sort of painted into a corner here -- this 
information should be carried in a volume header tag, but the existing 
volserver will reject dumps containing unknown tags.  So this new tag (and 
all new tags) must be sent only when the recipient is known to support new 
tags.  Fortunately, we've designed the dump format extensions such that any 
parser which supports new tags should be able to support _all_ new tags, if 
not the features they represent.  Thus, a single bit in a few places will 
do:

- volservers must separately advertise three capabilities:
  + support for new tags in volume dumps
  + support for gzip compressed dumps
  + support for bz2 compressed dumps

  Note that the first is purely a feature of the dump parser, but the
  others depend on external libraries, and so the same volserver may
  support different compression types depending on the libraries which
  were available when it was built.

- AFSVolDumpV2 needs a flag to indicate the caller supports new tags;
  vos dump should probably set this by default (eventually) but should
  include an option to control it.

- AFSVolDumpV2 needs a flag to indicate that an uncompressed dump is
  desired.  Again, vos dump needs a switch to control this.



This one is slightly off-topic for this list, and I may bring it up on 
openafs-devel later, but...

> - include zlib and bzip2 (src/zlib and src/bzip2)

No.  These directories are not actually present in the patch (probably 
because someone used the wrong arguments to diff), and shouldn't be 
included in OpenAFS anyway.  External dependencies should be resolved by 
depending on them, not by incorporating copies of them into OpenAFS.

We do _not_ want to get into the business of maintaining compression 
libraries.

-- Jeff