[OpenAFS] Re: busy .backup volumes
Kristen J. Webb
kwebb@teradactyl.com
Mon, 22 Apr 2013 10:11:30 -0600
Hi Andrew,
The filer was restarted early this a.m. We've seen
this come up semi-frequently in the past 6 months,
so I expect to see it again.
As for the timestamp from vos status fileserver,
after I ran entrans, it increased continuously.
I cannot say if it was unchanging before I ran
entrans. I'll try and gather this on the next
volume.
I'll also see what I can do about generating
any traces to determine if this is a some
sort of Rx timeout, RPC hang, an overcount
error, or something else.
Thanks for all the help everyone!
Kris
On 4/20/13 5:01 PM, Andrew Deason wrote:
> On Fri, 19 Apr 2013 18:51:12 -0600
> "Kristen J. Webb" <kwebb@teradactyl.com> wrote:
>
>> I can say for sure that server that issued the vos dump has been
>> rebooted since the transaction started. The other thing I am
>> observing is that repeated vos status on the fileserver shows the
>> lastActiveTime as current (increasing).
>
> It would be set to the current time when you ran the 'vos endtrans'
> command. I assume you just saw it increase once, and not increasing
> constantly.
>
> So again, everything you've said suggests there is an RPC holding a
> reference to that transaction, which is why it's not going away. So
> either:
>
> - The Rx call is still alive. Even if the client is gone, for some
> reason the Rx call is not dying (i.e. some bug in Rx; a timer not
> going off or something).
>
> - The Rx call died, but the RPC is still running. Maybe the volser RPC
> is hanging on some lock or some other thing.
>
> - There is no RPC still running, but the transaction still says someone
> is using it. We have a bug with a reference overcount on that
> transaction.
>
> The only way to know which it is is to look at a stack trace or core of
> the volserver process, or maybe 'rxdebug' would show a stuck call if the
> problem is a stuck Rx call.
>
> There have been some bugs in the past with the volserver not accessing
> transactions from multiple threads correctly (not locking things right).
> It could be something like that (though I don't think I've seen that
> _specific_ manifestation), or it could be something else entirely.
>
--
This message is NOT encrypted
--------------------------------
Mr. Kristen J. Webb
Chief Technology Officer
Teradactyl LLC.
2450 Baylor Dr. S.E.
Albuquerque, New Mexico 87106
Phone: 1-505-338-6000
Email: kwebb@teradactyl.com
Web: http://www.teradactyl.com
Providers of Scalable Backup Solutions
for Unique Data Environments
--------------------------------
NOTICE TO RECIPIENTS: Any information contained in or attached to this message
is intended solely for the use of the intended recipient(s). If you are not the
intended recipient of this transmittal, you are hereby notified that you
received this transmittal in error, and we request that you please delete and
destroy all copies and attachments in your possession, notify the sender that
you have received this communication in error, and note that any review or
dissemination of, or the taking of any action in reliance on, this communication
is expressly prohibited.
Regular internet e-mail transmission cannot be guaranteed to be secure or
error-free. Therefore, we do not represent that this information is complete or
accurate, and it should not be relied upon as such. If you prefer to communicate
with Teradactyl LLC. using secure (i.e., encrypted and/or digitally signed)
e-mail transmission, please notify the sender. Otherwise, you will be deemed to
have consented to communicate with Teradactyl via regular internet e-mail
transmission. Please note that Teradactyl reserves the right to intercept,
monitor, and retain all e-mail messages (including secure e-mail messages) sent
to or from its systems as permitted by applicable law.