[OpenAFS] Re: Crazy DAFS problem (with log)

Andrew Deason adeason@sinenomine.net
Mon, 21 Mar 2011 23:25:36 -0500

On Mon, 21 Mar 2011 18:07:28 -0500
Andrew Deason <adeason@sinenomine.net> wrote:

> I'm not sure what is at fault yet, but I can get this to happen.

Looks like this is 124359-related of all things (not the same bug,
though, for which I am very thankful). The leak is an ihandle leak in
VCloseVnodeFiles_r, which was actually fixed in
b9816e12f7ed8213c9c4eaea09e992e69ce4ee05 but was reverted in
12e85227c5dbfdb1258718ee3360bffacc4f96ac. Reinstating
b9816e12f7ed8213c9c4eaea09e992e69ce4ee05 fixes this for me. I've been
looking at this for quite a bit today and am a bit tired of it, so some
checking of my reasoning here is welcomed:

I wasn't around for the development discussion around that bug (though I
certainly did get to hear about it a lot), but I _believe_ that there
was some uncertainty of adding that IH_RELEASE call, since vnode handles
should get released when they are pulled from the free list (in
VGetFreeVnode_r). The thing is, VCloseVnodeFiles_r NULLs out the vnode
ihandle pointer (in VInvalidateVnodesByVolume_r) when it adds the vnode
ihandle to the returned array of ihandles, so I can't really see how the
IH_RELEASE call is not necessary, since we've gotten rid of all of the
pointer references we had via thevnode->handle.

For reference: 1.4.x, in this scenario, just doesn't NULL the vnode
ihandle ref, so it does get released when the vnode is pulled from the
free list.

However, I don't think this fully explains the behavior. I haven't
checked it out yet (and will not today), but this shouldn't really be
causing the fd to be left open. We are calling IH_REALLYCLOSE when the
volume goes offline, and we do appear to be going through the code path
that closes all relevant file descriptors. But after that is done, I
still see an FdHandle_t with a refcount of 0, in the _OPEN state holding
open the problematic inode (this is without the fix mentioned above).
So, it seems like ih_fdclose or whatever isn't doing its job, and it
seems like that warrants investigation as well.

Andrew Deason