[OpenAFS] Re: Crazy DAFS problem (with log)

Ryan C. Underwood nemesis@icequake.net
Mon, 21 Mar 2011 18:30:29 -0500


On Mon, Mar 21, 2011 at 06:07:28PM -0500, Andrew Deason wrote:
> 
> Yes, I can see what looks like some ihandle reference leak or something,
> which would do this.  When a release reclone is done, and an inode is
> deleted (since the old RO copy goes away), the fileserver is keeping the
> deleted inode open.  So, when we do the CoW later, and we need to reopen
> that same inode number, we get the cached IHandle_t and the cached
> FDHandle_t for it, so we read and write to the deleted file, but it's
> unlinked on disk.
> 
> I'm not sure what is at fault yet, but I can get this to happen. The
> order of operations is a little weird; it's not as simple as "CoW is
> broken" like I first thought. The deleted inode needs to stay in the
> ihandle cache based on a release, and then you need another CoW and then
> another release, or something like that. The thing is, you tend to not
> really notice that the file on disk isn't get updated for a while, since
> all of the fileserver I/O goes through the cached FD, so you can read
> and write the data fine (until the fileserver gets restarted, or the fd
> gets kicked out of the cache, or something causes the handle to
> close...)

Don't know if it helps your investigation, but my basic use case is that
I have a script which monitors system load and when the master is
basically idle, it goes around and looks for out of date volumes to
release.

In this case the system load is basically zero and I have a
small volume which really shouldn't be cloned since it contains logs
which are being appended to frequently these days, so it's getting
released every couple minutes by my script.  The files are always
appended and not random-accessed.  Wasn't a problem under 1.4.x, and I
don't think I saw any problems under 1.6.0-pre2 even, but my server
seems to be tripping over itself ever since installing 1.6.0-pre3.

I also have no idea what is writing to root.cell and triggering those
releases.  I know it's being updated based on vos examine output.  How
do I monitor file activity in real time?

-- 
Ryan C. Underwood, <nemesis@icequake.net>