[OpenAFS-devel] question: binary interface to kernel module (RHEL6.2/6.3, openafs 1.6.1)?

Simon Wilkinson simonxwilkinson@gmail.com
Wed, 29 Aug 2012 20:11:55 +0100


On 29 Aug 2012, at 16:21, Stephan Wiesand wrote:

> Since SL6, we have we have been using "kABI tracking kmods" for =
installing the OpenAFS kernel module on clients. For full information on =
this mechanism, see =
http://people.redhat.com/jcm/el6/dup/docs/dup_book.pdf . In short, you =
only have to compile and install the module once, and it will be used =
with future kernels as long as it doesn't use parts of the ABI that =
changed.
>=20
> Trying this may have been stupid in the first place. If so, happy =
bashing :-)

Not trying to bash, but you've encountered the problem that is built in =
to this approach.

What RedHat's kABI stuff guarantees is that a small whitelist of =
function signatures will not change across all of the kernels which =
claim to share the same ABI. They got to great lengths to make this the =
case - radically modifying changes that they backport from the mainline =
kernel so those changes can be incorporated without changing their =
guaranteed kABI. Roughly speaking the guarantee is that kernel modules =
built against one GA kernel will work against all kernels in that major =
version. Each minor version may add new functions, but they will never =
change the signatures of existing whitelisted functions, or remove =
whitelisted functions.

The critical thing to realise is that this guarantee only applies to =
functions on RedHat's whitelist. If you use non-whitelisted functions, =
you can't rely on the ABI guarantees. These functions may go away, or =
(worse) the number or nature of their arguments between any arbitrary =
kernel release. Because you aren't recompiling for each release, you =
won't notice the change in arguments and so will quite possibly end up =
calling something that expects a struct inode with a struct nameidata, =
or something that needs 5 arguments with 2, and so on.

Needless to say, OpenAFS uses many symbols which aren't on RedHat's =
whitelist. So, you are pretty much sitting on a ticking timebomb, with =
quite significant data integrity ramifications. To be honest, I'm =
surprised it has taken so long for things to break,

> Thanks a lot in advance for any insights.

If you want to track this down, I'd advise building a list of all of the =
symbols that OpenAFS uses that aren't on RedHat's whitelist for EL6 =
(from memory, this is a fairly considerable set), and then look at =
whether the function signatures, or structure definitions used by those =
function signatures have changed with the new kernel revision. That =
should give you an idea of where to look.

However, I suspect that you're going to continue to encounter this =
problem. Unless someone with a support contract can convince RedHat to =
include all of the symbols OpenAFS requires in their whitelist, it just =
isn't safe to use the kABI stuff for OpenAFS.

Cheers,

Simon.