[OpenAFS] Re: busy .backup volumes

Andrew Deason adeason@sinenomine.net
Fri, 3 May 2013 12:08:07 -0500


On Thu, 02 May 2013 13:03:30 -0600
"Kristen J. Webb" <kwebb@teradactyl.com> wrote:

> I'm not sure if this is enough info to be useful:

Well, it shows that there's not a running RPC keeping the transaction
around. So I'd assume it's a reference count leak on the transaction.
In the core, you can look at the relevant transactions by looking at the
'allTrans' global, specifically looking at the 'refCount' field in it,
or just look at the structure in general to see if it looks weird. I
assume the refcount is just stuck at 1 (or maybe -1).

As I mentioned before, there have been some known issues in 1.4 with
threads accessing transactions without proper locking, which I think
could conceivably cause something like that. Or maybe there's a code
path that is just a regular refcount leak, but I don't remember seeing
it. Either way, if that's it, it's pretty difficult to determine that
after the fact. If you really want to know why it's happening, you could
instrument the acquisition and release of transactions (THOLD, TRELE
either by modifying code or by running in a debugger), and track where
the imbalance happens.

-- 
Andrew Deason
adeason@sinenomine.net