[OpenAFS] Re: Crazy DAFS problem (with log)

Andrew Deason adeason@sinenomine.net
Mon, 21 Mar 2011 18:07:28 -0500


On Mon, 21 Mar 2011 14:05:58 -0500
Andrew Deason <adeason@sinenomine.net> wrote:

> It _could_ be something like ihandle having a file handle open to the
> wrong file, I think.

Yes, I can see what looks like some ihandle reference leak or something,
which would do this.  When a release reclone is done, and an inode is
deleted (since the old RO copy goes away), the fileserver is keeping the
deleted inode open.  So, when we do the CoW later, and we need to reopen
that same inode number, we get the cached IHandle_t and the cached
FDHandle_t for it, so we read and write to the deleted file, but it's
unlinked on disk.

I'm not sure what is at fault yet, but I can get this to happen. The
order of operations is a little weird; it's not as simple as "CoW is
broken" like I first thought. The deleted inode needs to stay in the
ihandle cache based on a release, and then you need another CoW and then
another release, or something like that. The thing is, you tend to not
really notice that the file on disk isn't get updated for a while, since
all of the fileserver I/O goes through the cached FD, so you can read
and write the data fine (until the fileserver gets restarted, or the fd
gets kicked out of the cache, or something causes the handle to
close...)

-- 
Andrew Deason
adeason@sinenomine.net