[OpenAFS] Re: busy .backup volumes

Kristen J. Webb kwebb@teradactyl.com
Sat, 20 Apr 2013 10:40:47 -0600


I had tried to unlock but forgot to mention it.
The behavior is repeatable:

# vos unlock 1953047510 -localauth
Released lock on vldb entry for volume 1953047510

# vos ex 1953047510
**** Volume 1953047510 is busy ****
...


Kris

On 4/20/13 7:23 AM, Jason Edgecombe wrote:
> Try " vos unlock 1953047510"
>
> On 04/19/2013 08:51 PM, Kristen J. Webb wrote:
>> I was able to find a newer version of vos on a new system:
>>
>> # vos --version
>> openafs 1.6.2.1
>>
>> I reported bos --version (1.4.14), since I figured it would
>> represent the version of the volserver ;)
>>
>> So I tried:
>>
>> # vos  endtrans fileserver 22 -localauth
>> # vos  endtrans fileserver 58191 -localauth
>>
>> Interestingly, the messages have stopped posting
>> to the Volserver log, but I still get:
>>
>> # vos status fileserver
>> Total transactions: 2
>> --------------------------------------
>> transaction: 58191  created: Fri Apr 19 07:27:14 2013
>> lastActiveTime: Fri Apr 19 20:28:48 2013
>> attachFlags:  busy
>> transactionFlags: delete
>> volume: 1953047510  partition: /vicepc  procedure: Dump -> basically a failed
>> vos dump of a .backup volume
>> packetRead: 1  lastReceiveTime: Fri Apr 19 10:49:04 2013
>> packetSend: 1  lastSendTime: Fri Apr 19 10:56:04 2013
>> --------------------------------------
>>
>> --------------------------------------
>> transaction: 22  created: Sun Apr 14 04:43:55 2013
>> lastActiveTime: Fri Apr 19 20:28:48 2013
>> attachFlags:  busy
>> transactionFlags: delete
>> volume: 1953054615  partition: /vicepc  procedure: Dump -> basically a failed
>> vos dump of a .backup volume
>> packetRead: 1  lastReceiveTime: Tue Apr 16 09:52:25 2013
>> packetSend: 1  lastSendTime: Tue Apr 16 09:59:25 2013
>> --------------------------------------
>>
>> I can say for sure that server that issued the vos dump has been rebooted
>> since the transaction started.  The other thing I am observing is that
>> repeated vos status on the fileserver shows the lastActiveTime as current
>> (increasing).
>>
>> Also, still getting:
>>
>> # vos ex 1953047510
>> **** Volume 1953047510 is busy ****
>>
>> # vos ex 1953054615
>> **** Volume 1953054615 is busy ****
>>
>> Since they are .backup volumes, this is not impacting
>> the users live data, yet, But backups are note completing....
>>
>> I'm going to check later to see if some other timeout takes effect. I'm
>> curious if there is something more to try in this configuration. Still trying
>> to avoid a bos restart for now.  Oh, wait, after about 20 minutes, the Volserver
>> log started reporting again, very interesting:
>>
>> Fri Apr 19 20:25:06 2013 trans 58191 on volume 1953047510 is older than 46650
>> seconds
>> Fri Apr 19 20:25:06 2013 trans 22 on volume 1953054615 is older than 488460
>> seconds
>> Fri Apr 19 20:25:36 2013 trans 22 on volume 1953054615 is older than 488490
>> seconds
>> Fri Apr 19 20:26:06 2013 trans 22 on volume 1953054615 is older than 488520
>> seconds
>> Fri Apr 19 20:48:06 2013 trans 58191 on volume 1953047510 is older than 300
>> seconds
>> Fri Apr 19 20:48:06 2013 trans 22 on volume 1953054615 is older than 300 seconds
>> Fri Apr 19 20:48:36 2013 trans 58191 on volume 1953047510 is older than 330
>> seconds
>>
>>
>> Thanks for the tips Andrew!
>>
>> Kris
>> On 4/19/13 4:11 PM, Andrew Deason wrote:
>>> On Fri, 19 Apr 2013 13:06:20 -0600
>>> "Kristen J. Webb" <kwebb@teradactyl.com> wrote:
>>>
>>>> # bos --version
>>>> openafs 1.4.14
>>>
>>> Er, you mean 'vos'? But I assume they're all the same :)
>>>
>>>> Fri Apr 19 15:02:36 2013 trans 22 on volume 1953054615 is older than
>>>> 469110 seconds
>>>>
>>>> We've been just restarting the server.  Is there a better way to kill
>>>> these hung transactions without a sledge hammer?
>>>
>>> You can try <http://docs.openafs.org/Reference/1/vos_endtrans.html>. It
>>> doesn't exist in 1.4, but using the 'vos' from 1.6+ would work fine.
>>>
>>> However, I thought that specific message indicates that there's still an
>>> RPC holding a reference to that transaction (otherwise it would say it's
>>> idle). I might not be remembering the details correctly at the moment,
>>> but if so, you can't do anything to kill the transaction until you find
>>> the executing RPC that's using it and get it to stop. So, you'd need to
>>> look at 'vos status', rxdebug, or maybe a stack/core dump to see who's
>>> doing that.
>>>
>>
>
>

-- 
This message is NOT encrypted
--------------------------------
Mr. Kristen J. Webb
Chief Technology Officer
Teradactyl LLC.
2450 Baylor Dr. S.E.
Albuquerque, New Mexico 87106
Phone: 1-505-338-6000
Email: kwebb@teradactyl.com
Web: http://www.teradactyl.com

Providers of Scalable Backup Solutions
    for Unique Data Environments

--------------------------------
NOTICE TO RECIPIENTS: Any information contained in or attached to this message 
is intended solely for the use of the intended recipient(s). If you are not the 
intended recipient of this transmittal, you are hereby notified that you 
received this transmittal in error, and we request that you please delete and 
destroy all copies and attachments in your possession, notify the sender that 
you have received this communication in error, and note that any review or 
dissemination of, or the taking of any action in reliance on, this communication 
is expressly prohibited.


Regular internet e-mail transmission cannot be guaranteed to be secure or 
error-free. Therefore, we do not represent that this information is complete or 
accurate, and it should not be relied upon as such. If you prefer to communicate 
with Teradactyl LLC. using secure (i.e., encrypted and/or digitally signed) 
e-mail transmission, please notify the sender. Otherwise, you will be deemed to 
have consented to communicate with Teradactyl via regular internet e-mail 
transmission. Please note that Teradactyl reserves the right to intercept, 
monitor, and retain all e-mail messages (including secure e-mail messages) sent 
to or from its systems as permitted by applicable law.