[OpenAFS] Re: busy .backup volumes

Andrew Deason adeason@sinenomine.net
Sat, 20 Apr 2013 18:01:51 -0500


On Fri, 19 Apr 2013 18:51:12 -0600
"Kristen J. Webb" <kwebb@teradactyl.com> wrote:

> I can say for sure that server that issued the vos dump has been
> rebooted since the transaction started.  The other thing I am
> observing is that repeated vos status on the fileserver shows the
> lastActiveTime as current (increasing).

It would be set to the current time when you ran the 'vos endtrans'
command. I assume you just saw it increase once, and not increasing
constantly.

So again, everything you've said suggests there is an RPC holding a
reference to that transaction, which is why it's not going away. So
either:

 - The Rx call is still alive. Even if the client is gone, for some
   reason the Rx call is not dying (i.e. some bug in Rx; a timer not
   going off or something).

 - The Rx call died, but the RPC is still running. Maybe the volser RPC
   is hanging on some lock or some other thing.

 - There is no RPC still running, but the transaction still says someone
   is using it. We have a bug with a reference overcount on that
   transaction.

The only way to know which it is is to look at a stack trace or core of
the volserver process, or maybe 'rxdebug' would show a stuck call if the
problem is a stuck Rx call. 

There have been some bugs in the past with the volserver not accessing
transactions from multiple threads correctly (not locking things right).
It could be something like that (though I don't think I've seen that
_specific_ manifestation), or it could be something else entirely.

-- 
Andrew Deason
adeason@sinenomine.net