[OpenAFS] Re: busy .backup volumes

Kristen J. Webb kwebb@teradactyl.com
Mon, 22 Apr 2013 10:11:30 -0600


Hi Andrew,
The filer was restarted early this a.m.  We've seen
this come up semi-frequently in the past 6 months,
so I expect to see it again.

As for the timestamp from vos status fileserver,
after I ran entrans, it increased continuously.
I cannot say if it was unchanging before I ran
entrans.  I'll try and gather this on the next
volume.

I'll also see what I can do about generating
any traces to determine if this is a some
sort of Rx timeout, RPC hang, an overcount
error, or something else.

Thanks for all the help everyone!
Kris

On 4/20/13 5:01 PM, Andrew Deason wrote:
> On Fri, 19 Apr 2013 18:51:12 -0600
> "Kristen J. Webb" <kwebb@teradactyl.com> wrote:
>
>> I can say for sure that server that issued the vos dump has been
>> rebooted since the transaction started.  The other thing I am
>> observing is that repeated vos status on the fileserver shows the
>> lastActiveTime as current (increasing).
>
> It would be set to the current time when you ran the 'vos endtrans'
> command. I assume you just saw it increase once, and not increasing
> constantly.
>
> So again, everything you've said suggests there is an RPC holding a
> reference to that transaction, which is why it's not going away. So
> either:
>
>   - The Rx call is still alive. Even if the client is gone, for some
>     reason the Rx call is not dying (i.e. some bug in Rx; a timer not
>     going off or something).
>
>   - The Rx call died, but the RPC is still running. Maybe the volser RPC
>     is hanging on some lock or some other thing.
>
>   - There is no RPC still running, but the transaction still says someone
>     is using it. We have a bug with a reference overcount on that
>     transaction.
>
> The only way to know which it is is to look at a stack trace or core of
> the volserver process, or maybe 'rxdebug' would show a stuck call if the
> problem is a stuck Rx call.
>
> There have been some bugs in the past with the volserver not accessing
> transactions from multiple threads correctly (not locking things right).
> It could be something like that (though I don't think I've seen that
> _specific_ manifestation), or it could be something else entirely.
>

-- 
This message is NOT encrypted
--------------------------------
Mr. Kristen J. Webb
Chief Technology Officer
Teradactyl LLC.
2450 Baylor Dr. S.E.
Albuquerque, New Mexico 87106
Phone: 1-505-338-6000
Email: kwebb@teradactyl.com
Web: http://www.teradactyl.com

Providers of Scalable Backup Solutions
    for Unique Data Environments

--------------------------------
NOTICE TO RECIPIENTS: Any information contained in or attached to this message 
is intended solely for the use of the intended recipient(s). If you are not the 
intended recipient of this transmittal, you are hereby notified that you 
received this transmittal in error, and we request that you please delete and 
destroy all copies and attachments in your possession, notify the sender that 
you have received this communication in error, and note that any review or 
dissemination of, or the taking of any action in reliance on, this communication 
is expressly prohibited.


Regular internet e-mail transmission cannot be guaranteed to be secure or 
error-free. Therefore, we do not represent that this information is complete or 
accurate, and it should not be relied upon as such. If you prefer to communicate 
with Teradactyl LLC. using secure (i.e., encrypted and/or digitally signed) 
e-mail transmission, please notify the sender. Otherwise, you will be deemed to 
have consented to communicate with Teradactyl via regular internet e-mail 
transmission. Please note that Teradactyl reserves the right to intercept, 
monitor, and retain all e-mail messages (including secure e-mail messages) sent 
to or from its systems as permitted by applicable law.