[OpenAFS] unresponsive clients after lowest ip database server went down

Benjamin Kaduk kaduk@MIT.EDU
Thu, 27 Aug 2015 20:06:16 -0400 (EDT)


Hi Jonathan,

On Thu, 27 Aug 2015, Jonathan Leung-Nilsson wrote:

> So I am mainly wondering if this is expected - if OpenAFS depends on having
> its lowest IP address server online all the time - or if it's likely that
> we have a configuration issue in our cell. I setup our cell about 5 years
> ago as a complete newbie to OpenAFS, and while I've gained a lot of
> insights and experience since, I still don't understand all the nuances.

The short answer is that clients are expected to continue functioning even
if the lowest-IP db server is offline, the remaining N-1 are supposed to
elect a new coordinator and read-write access resume within a couple
election cycles; clients might experience full hangs or just inability to
make database changes for a couple minutes as things recover.

The long answer requires more research and discussion of edge cases such
as network partitions, timeouts, and such, which I am not prepared to
perform right now.

-Ben