[OpenAFS] Re: 1.6.0pre2 - more vos issues, possible bug

Andy Cobaugh phalenor@gmail.com
Fri, 4 Mar 2011 17:20:34 -0500 (EST)

On 2011-03-04 at 15:59, Andrew Deason ( adeason@sinenomine.net ) said:
> What about the command immediately preceding this? Anything odd about
> it; time it took to execute, or any warnings/errors/etc?

The commands before that all completed in 30 seconds or less. No messages 
other than that.

>> I'm not sure how related this is to the other issue I saw, where the
>> backup clone was left in a much worse state.
> I don't think it is; that error above isn't even really much of a
> problem; we just failed to end the transaction, but the the transaction
> is idle by that point and will be ended automatically after 5 minutes
> (as you see in the VolserLog).
> The first issue you reported had problems much earlier before the log
> messages you gave. Did anything happen to the backup volume before that?
> No messages referencing that volume id? Did you or someone/thing else
> remove the backup clone or anything?

Nope. We don't even access the backup volume when doing the file-level 
backups anymore.

> The first messages around Tue Mar  1 00:02:12 2011 look like what would
> happen if you tried to recreate the BK after it was deleted with that
> code (fixed in the patches I mentioned before). The subsequent salvages
> are from an error to read some header data, which could be explained by
> the attempted 'zap's and such, assuming those messages were during/after
> you noticed the volume being inaccessible and tried forcefully deleting
> it.

Yes, the zaps were me trying to get the .backup into a usable state. 
Though, the first string of salvages started in the middle of the 
afternoon without any intervention - I think the event that caused them 
is what's missing from the picture.

I'm still a little hesitant to bos salvage that server - whole reason 
we're trying to switch to DAFS is to avoid the multi-hour fileserver 

I'm going to take some time either later tonight, or early next week to go 
back through the logs and try to make more sense of them from a 
chronological standpoint, and see if there's anything I missed.

There's still a bug somewhere that causes a .backup volume to go off-line 
after being created. I have a test volume on one of the problem 
fileservers right now, that's been vos backup'd once a minute since 
yesterday without a problem. So, something else must have to happen to 
cause this, just not sure what.

