[OpenAFS] Fail over to replica sites

Nathan Neulinger nneul@umr.edu
08 Aug 2002 21:31:19 -0500


On Thu, 2002-08-08 at 21:01, Russ Allbery wrote:
> Nathan Neulinger <nneul@umr.edu> writes:
> 
> > Yes. It's not reproducible though. I have yet to be able to "do"
> > anything to the file/vol servers to trigger the symptom.
> 
> > Note - I have not seen it when the server really cleanly goes down. In
> > those cases, it fairly reliably switches. I have however seen the
> > problem numerous times when a file server starts to not respond for some
> > reason. However, it must be responding to some stuff, cause it doesn't
> > ever completely go down. If I kill -STOP the fileserver, the clients see
> > it instantaneously. (Quicker in my case with the RX_DEADTIME being
> > small.) Immediate response on most clients to the -CONT as well.
> 
> In this case, the server just went away completely without any warning.
> (Basically, the machine was powered off by accident.)  Many of our clients
> didn't recover and see the replicated volumes located on that server until
> the server came back up (and they were pointing to the read-only path and
> should have been able to find one of the other two replicas).

I'll have to try that with a one of our test servers and see if yanking
the ethernet cable results in a similar response. I figured that a -STOP
would yield that result, but apparently not. 

What's your networking environment? All switched? All clients or just
some of them?

> > In our cases though, it sometimes doesn't ever get to the 'connection
> > timed out' point... It just hangs forever.
> 
> I've not seen that myself.  This was more what I'd expect when a
> read/write server was down.  When you tried to access something that was
> replicated on that server, the system would respond "connection timed out"
> immediately.  There was no delay; it was obvious that it had cached that
> the system was down and wasn't retrying network access.

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216