[OpenAFS] Re: 1.6.0pre2 - more vos issues, possible bug

Andy Cobaugh phalenor@gmail.com
Fri, 4 Mar 2011 17:20:34 -0500 (EST)


On 2011-03-04 at 15:59, Andrew Deason ( adeason@sinenomine.net ) said:
>
> What about the command immediately preceding this? Anything odd about
> it; time it took to execute, or any warnings/errors/etc?

The commands before that all completed in 30 seconds or less. No messages 
other than that.

>> I'm not sure how related this is to the other issue I saw, where the
>> backup clone was left in a much worse state.
>
> I don't think it is; that error above isn't even really much of a
> problem; we just failed to end the transaction, but the the transaction
> is idle by that point and will be ended automatically after 5 minutes
> (as you see in the VolserLog).
>
> The first issue you reported had problems much earlier before the log
> messages you gave. Did anything happen to the backup volume before that?
> No messages referencing that volume id? Did you or someone/thing else
> remove the backup clone or anything?

Nope. We don't even access the backup volume when doing the file-level 
backups anymore.

> The first messages around Tue Mar  1 00:02:12 2011 look like what would
> happen if you tried to recreate the BK after it was deleted with that
> code (fixed in the patches I mentioned before). The subsequent salvages
> are from an error to read some header data, which could be explained by
> the attempted 'zap's and such, assuming those messages were during/after
> you noticed the volume being inaccessible and tried forcefully deleting
> it.

Yes, the zaps were me trying to get the .backup into a usable state. 
Though, the first string of salvages started in the middle of the 
afternoon without any intervention - I think the event that caused them 
is what's missing from the picture.

I'm still a little hesitant to bos salvage that server - whole reason 
we're trying to switch to DAFS is to avoid the multi-hour fileserver 
outages.

I'm going to take some time either later tonight, or early next week to go 
back through the logs and try to make more sense of them from a 
chronological standpoint, and see if there's anything I missed.

There's still a bug somewhere that causes a .backup volume to go off-line 
after being created. I have a test volume on one of the problem 
fileservers right now, that's been vos backup'd once a minute since 
yesterday without a problem. So, something else must have to happen to 
cause this, just not sure what.

> -- 
> Andrew Deason
> adeason@sinenomine.net
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>

--andy