[OpenAFS] Re: - Locked volumes

Andrew Deason adeason@sinenomine.net
Wed, 7 Mar 2012 11:33:24 -0600


On Wed, 07 Mar 2012 10:10:51 +0100
ProbaNet <info@probanet.it> wrote:

> > You get absolutely no output? Not even "Binding to the VLDB server" ?
> 
> Ops, you're right: with 'unlockvldb' there was that line of output,
> "Binding to the VLDB server", but then nothing else

Okay, then don't bother with the other steps I mentioned.

Can you make any modifications to the vldb or ptdb from this machine?
Just as an example... 'vos addsite' or 'pts createuser' ?

> (also in the logs, only messages about elections).
> 
[...]
> After a while we found the real problem with "udebug afsmn1 vlserver".
> Quorum OK (all servers vote yes for afsmn1), but different db version
> for server afsrm1 (dbcurrent=0, up=1 beaconSince=1). Recovery state "f".
> No propagation triggered.. We don't understand why..

One possible cause may be due to packets not getting through. For an
operation like 'vos lock'/'vos unlock' we should only need to contact
the sync site, so the db being out of sync should not matter. However,
both dbcurrent=0 and a hang could be symptoms of not being able to
communicate with the sync site.

If you want to quickly test this, you can run udebug against each of the
dbservers from each of the other dbservers. If beacons are getting
through but not database updates, though, maybe there could be an issue
where only packets over a certain size aren't getting through or
something, hmm...

[...]
> At this point we expected a db propagation to afsrm1.. But nothing
> happened in 1 hour, nothing in the logs, dbcurrent=0, different db
> versions and vldb frozen again.. (solved again with the scp method
> described above).
> Any suggestion? :) Thank you very much for your help!

You mention above there are at least _some_ messages in the log. What
are they?

Also, who is the sync site, according to 'udebug' ? Would you be willing
to provide the 'udebug' output?

> P.S.: we are planning to turn afsrm1 and afsor1 (actually regular voting
> dbservers) into non-voting clone-servers: is that a simple task? Any
> suggestion to do that?

It's not that hard. You just stop the database servers, update the
server-side CellServDB and start them all up again.

-- 
Andrew Deason
adeason@sinenomine.net