[OpenAFS-devel] comerr and openafs

Marcus Watts mdw@umich.edu
Wed, 02 Aug 2006 16:54:44 -0400


error handling.

The problem:
afs today has 3 special error ranges that don't map nicely into MIT.
They are
-1..-8		rx errors
-457..-450 	rpc errors
101..111	vol errors

afs has several other errors which aren't currently decoded
by com_err including -100, 120, etc. - all volume errors.

Just for fun; one other interesting property of AFS.  AFS
considers all negative error numbers to be communications failures.
All existing AFS communications errors have the range -1000..-1 (actually,
-457..-1).  Many logical mit error codes have negative values, including
obvious ones like krb5.  It would be nice if MIT lower-case error numbers
did not need to be handled specially.  { jhutz points out that rxkad
has its own special error handling exceptions.  Need to explore this. }

The goal:
come up with a way to manage errors so that error_message() needs
no special custom hacks.  That way, an error_message() from some
other library could be used instead (such as heimdal, mit, e2fsprogs...)

Some approaches:

I. remap all AFS errors into standard error ranges.
	breaks compatibility with old clients.

II. have a special error code negotiation phase like uae.
	send new mit compatible codes to new clients.  Send
	the original codes for old clients.  Use new codes in core.

III. use old codes on the wire.  use new codes in core.
	clients that don't negotiate the use of uae can't send system
	errors that overlap volume errors.

IV. use old codes "just like now".  Preserves identical properties
	to existing code.  Requires error_message() handle non-standard
	bases.

I've been exploring option IV, which I believe is the most conservative
approach.  I think option II is the next most attractive.  III might
have value if it can be worked to "cost" no more than uae.

For any of approaches I-IV, it's necessary to modify openafs to
actually add its error tables.  II, III also require error
code munging logic between in-core & on-the-wire representations.
I'm not exactly sure how this will work, but provisionally I'm
expecting to see/do something like:
	afs_init_ets();
or
	initialize_afs_error_tables();
or some such at various random places.

In order to handle errors with non-standard bases like -8, -457, & 100,
it's desirable to have the logic in error_message() handle "non-standard"
bases.  Heimdal happens to have done this for a long time, & I believe
I talked tytso to do this with e2fsprogs as well.
This means the error code check becomes:
	base <= code < base + n_msgs
instead of existing:
	base == (code & ~255)
this allows the use of more than 256 messages in one error table,
allows non-standard bases, and costs no more in CPU.  I believe doing
this is a no-brainer even if we decide not to actually take advantage
of it.  To be able to take advantage of non-standard bases, one needs
a compile_et that does this.  I had a perl script that did that, I now
have a version of openafs's compile_et that also does this.

					-Marcus Watts