[OpenAFS] Re: 1.6.0pre2 - more vos issues, possible bug

Andy Cobaugh phalenor@gmail.com
Wed, 2 Mar 2011 00:24:15 -0500 (EST)

On 2011-03-01 at 22:23, Andrew Deason ( adeason@sinenomine.net ) said:
> On Tue, 1 Mar 2011 22:38:07 -0500 (EST)
> Andy Cobaugh <phalenor@gmail.com> wrote:
>> (and I think you meant dafssync-debug. I may not have mentioned that.)
> fssync-debug should detect a DAFS fileserver and execute dafssync-debug
> for you.

If I just do fssync-debug, it tells me this:

*** server asserted demand attach extensions. fssync-debug not built to
*** recognize those extensions. please recompile fssync-debug if you need
*** to dump dafs extended state

> Have you done successful 'vos backup's of that volume after the
> 1.6.0pre2 upgrade? Or did you upgrade and it broke?

Oh yes, definitely. It was upgraded on Feb 19.

> Hmm, well, I interpreted "turned debugging up" to mean "up all the way",
> which actually probably isn't true. The messages I'm looking for are at
> level 125, and there's a lot of them (they log every FSSYNC request and
> response).

Yeah, only running at 5 right now.

>> If I look in FileLog.old (I restarted at some point to up the debug
>> level), I see these lines:
> You can change that with SIGHUP/SIGTSTP (unless you're doing that for a
> permanent change).

Is that to increase/decrease logging level, respectively?

>> Tue Mar  1 16:11:34 2011 FSYNC_com:  read failed; dropping connection (cnt=94804)
>> Tue Mar  1 16:11:34 2011 FSYNC_com:  read failed; dropping connection (cnt=94805)
> There should be a SYNC_getCom right before these (though it probably
> just says "error receiving command"). Just to be sure, there aren't any
> processes dying/respawning in BosLog{,.old}, are there?

No processing dying, fortunately.

>> Failed to end the transaction on the rw volume 536871059
>> ____: server not responding promptly
>> Error in vos backup command.
>> ____: server not responding promptly
> That's RX_CALL_TIMEOUT, which I'm not used to seeing on volserver
> RPCs... Do you know how long it took to error out with that? If it takes
> a while, a core of the volserver/fileserver while it's hanging would be
> ideal. It might just be the fileserver trying to salvage the volume a
> bunch of times or something, though, and that takes too long.

>From the start of the vos backup command until it returned was 16s 
according to our logs.