[AFS3-std] Compression support for AFS vos dump - Specification
Jeffrey Hutzelman
jhutz@cmu.edu
Thu, 01 Nov 2007 13:43:15 -0400
On Monday, October 29, 2007 07:42:09 PM -0400 Jeffrey Altman
<jaltman@secure-endpoints.com> wrote:
> Peter Somogyi back almost three years ago proposed the following
> protocol enhancement for creating dump files containing compressed
> data. An implementation of this proposal is available for review
> in OpenAFS RT Ticket 17947.
>
> Feedback is requested.
+1 on the design, as described here, but I do have a couple of comments,
noted below.
I should note that I had a fair amount of input into this, as Peter and I
discussed a number of problems with his original proposal, and that
discussion led to what he has actually proposed to the community and
implemented.
I have not yet reviewed the implementation.
> - new volserver must produce always compressed dump if the compression is
> turned on - at "vos dump"
Ideally, there should be a way for clients to signal that they want an
uncompressed dump, even if the volserver would normally send a compressed
dump. This could be accomplished via a flag to AFSVolDumpV2 (which did not
exist at the time this proposal was drafted).
> - <length>: variable size length, 0x80=unknown (means that
> this section must be parsed to proceed)
> - If bit 7 is not set, bits 0-6 are the length
> - If bit 7 is set, bits 0-6 tell you how many bytes long the
> length is. The length follows immediately in MSB-first order.
> - The special value 0x80 means indefinite length.
> - 0xfe and 0xff would indicate a single-bit value, with the
> value stored in the low-order bit of the length.
It should be noted that this describes the form of the length portion of
all TLV-form tags; that is, all currently-undefined tags from 0x05 through
0x60, inclusive. Similarly, the rules quoted later permanently establish a
data size of 32 bits for currently-undefined tags in the range 0x61-0x7a,
and a data size of 0 bits for tags 0x7b-0x7f. The data size and format of
tags which are already defined does not change. The goal here was to make
it possible to write a dump parser which can process dumps containing new
tags it does not understand.
Note that the <length> format as described here includes an indefinite form
(0x80), which can be used when the data size is not known at the time the
tag is emitted. This is necessary for compressed dumps, but carries an
important restriction -- such a tag cannot be skipped by a parser which
does not understand its contents, because the data is self-describing and
its end can be found only by parsing it. This restriction could be removed
at the expense of some overhead, by chunking indefinite-length data, with
each chunk prefixed by a length in a standard form. I don't believe such a
change is necessary for compressed volume dumps, since the parser would
have to understand compressed data anyway in order to do anything with the
dump.
> - "don't compress" flag must be transmitted only when the source
> volserver's compression is set, and each receiver volserver supports this
> flag (in case of "vos release"), or when "vos dump" was invoked
It appears that this proposal transmits the "don't compress" flag as part
of the compressed volume dump, which is silly -- if the flag is set, a
compressed dump will never be sent, and if a compressed dump is not set,
you lose the flag information.
Unfortunately, we are sort of painted into a corner here -- this
information should be carried in a volume header tag, but the existing
volserver will reject dumps containing unknown tags. So this new tag (and
all new tags) must be sent only when the recipient is known to support new
tags. Fortunately, we've designed the dump format extensions such that any
parser which supports new tags should be able to support _all_ new tags, if
not the features they represent. Thus, a single bit in a few places will
do:
- volservers must separately advertise three capabilities:
+ support for new tags in volume dumps
+ support for gzip compressed dumps
+ support for bz2 compressed dumps
Note that the first is purely a feature of the dump parser, but the
others depend on external libraries, and so the same volserver may
support different compression types depending on the libraries which
were available when it was built.
- AFSVolDumpV2 needs a flag to indicate the caller supports new tags;
vos dump should probably set this by default (eventually) but should
include an option to control it.
- AFSVolDumpV2 needs a flag to indicate that an uncompressed dump is
desired. Again, vos dump needs a switch to control this.
This one is slightly off-topic for this list, and I may bring it up on
openafs-devel later, but...
> - include zlib and bzip2 (src/zlib and src/bzip2)
No. These directories are not actually present in the patch (probably
because someone used the wrong arguments to diff), and shouldn't be
included in OpenAFS anyway. External dependencies should be resolved by
depending on them, not by incorporating copies of them into OpenAFS.
We do _not_ want to get into the business of maintaining compression
libraries.
-- Jeff