[OpenAFS] Re: 1.6.2 buserver + butc

Derrick Brashear shadow@gmail.com
Wed, 27 Mar 2013 14:53:26 -0400


On Wed, Mar 27, 2013 at 2:44 PM, Andrew Deason <adeason@sinenomine.net> wrote:
> On Tue, 26 Mar 2013 20:04:15 -0400 (EDT)
> Prasad Dharmasena <pkd@glue.umd.edu> wrote:
>
>> The vicep* partitions (or volsets), for which the backup dump/butc
>> hang, are not consistent.  If we kill and restart the dump process,
>> some of the previously hung volsets finish while others hang.
>>
>> What info do we need to grab from butc and buserver in order to
>> track the problem?
>
> I assume there's nothing helpful in BackupLog?
>
> I haven't worked with butc/buserver for a long time, so I don't remember
> if there are ways to get more information out of them specifically.
> However, just going by what works in general:
>
> One pretty surefire way of getting to know what's happening is to grab a
> core from the butc and buserver processes while they are hanging ('gcore
> <pid>'). You'll need a developer to look at that to say what's going on,
> and those cores will contain sensitive information. But if there is
> someone you trust enough with it, that will let you know what's
> happening.

there won't be anything sensitive in just a stack trace, and on solaris that
is easily generated with just running pstack and could be shared
with impunity. it may or may not be enough information but it's
certainly a place
to start.

-- 
Derrick