[OpenAFS-devel] Re: Callbacks on shutdown -- release-team minutes 2014-02-12

Stephan Wiesand stephan.wiesand@desy.de
Thu, 20 Feb 2014 19:07:35 +0100


On Feb 20, 2014, at 10:06 , Jeffrey Hutzelman wrote:

> On Wed, 2014-02-19 at 19:33 -0600, Andrew Deason wrote:
>> On Wed, 19 Feb 2014 20:25:14 -0500
>> drosih@rpi.edu wrote:
>>=20
>>> I'm sorry, but I didn't notice this topic come up before.  What
>>> problems would be seen when these clients connect/disconnect to =
those
>>> ancient versions of file servers?  I'm not asking that the change be
>>> skipped, but just wondering what behavior would be seen.
>>=20
>> "Undefined behavior". In theory, anything could happen, but the most
>> likely result is that the fileserver just crashes (SIGSEGV, SIGBUS,
>> SIGABRT, etc). If I recall correctly, the busier the server is, the =
more
>> likely it will have a problem.
>=20
> Pushing out a client change that causes fileservers -- especially
> pre-DAFS fileservers -- to mysteriously crash is kind of poor.
> Announcements to people who are actively following things and likely =
to
> install new clients won't help server operators whose fileservers
> suddenly start crashing with little or no warning.  I certainly =
wouldn't
> want to be forced into a "surprise" upgrade.

Who would. But then, upgrading your 1.4.<=3D5 fileservers to at least =
1.4.6 isn't that much of an adventure. I ran 1.4.7 servers for a long =
time, and they were pretty good. And if you're running 1.3 or 1.5 =
servers, you should like surprises.

And: such sites have been ignoring an OpenAFS security advisory for more =
than six years, and at risk for almost seven. Because the Windows client =
has been doing it since then. All it takes for the problem to strike is =
introducing more Windows clients, or teaching the exiting ones new =
tricks - like having a sizable cluster run maintenance scripts from AFS =
and then reboot every night, which is how the problem was initially =
found. Presumably, introducing YFS clients (including the iOS one) would =
trigger it as well. And I'm told Arla clients will bite you too.

I figure quite a few sites with such old servers may migrate from =
Windows XP and old clients to new ones which do give up callbacks in the =
near future. Those are in for a surprise, clearly with no warning. I'm =
not convinced that  "protecting" them so far was doing them a favor, nor =
that continuing to do so would in the future.

But I admit it's a tough decision.

> It seems like the right way to handle this is to define a capability
> flag to indicate that RXAFS_GiveUpAllCallBacks() is safe, and make the
> call only when the fileserver advertises that flag.  Of course, =
ideally
> the flag would have been introduced back when the bug was fixed, but
> that ship sailed years ago.

Yes, but it may still be an option. Any estimate what it would take to =
implement it, and to maintain it forever?

> I'm also a little concerned at the insistence on introducing a
> potentially disruptive, backward-incompatible behavior into what's
> supposed to be a stable release series with no mechanism to turn it =
off.
> Did we become GNOME when I wasn't looking?

With my site admin hat on: I'd like to have this feature. And I wouldn't =
like to wait another couple of years. I do sympathize with admins bitten =
by it. But not enough not to want it, especially for the reasons =
outlined above.

Changing hats. As the "stable series" release manager: This has been a =
controversial issue for years. And that's not going to change. We can =
postpone it once more, but will then have the same decision to take, and =
the same discussion, and no solution making everyone happy, when we =
create the 1.9 branch and/or when we promote 1.9 to "stable" (1.10). =
We'll still have those who want the feature, those who accept it if =
there's a knob, those who want the knob to default to on, those who want =
it to default to off, those who are willing to implement the knob but =
not a simple on/off one but only a more complex one allowing per-site =
configuration (but that's not feasible anytime soon), and those who =
object to one more knob altogether (especially if it's complex).

What I do insist on in my release manager role is to get such issues off =
the table, one way or the other, and not let such deadlocks happen. =
Postponing such decisions is fine as long as there's hope for a much =
better solution. But once it's clear that the situation is not going to =
change, any decision is better than none (once again).

And in doubt, I'm for progress rather than stagnation.

If this discussion, or the one following the planned announcements, =
turns up a killer argument for not introducing the feature ever, fine. =
In that case, IMO it should be removed from the master and 1.7 branches =
too. If not, I believe having it in a stable release after due =
announcement is the right thing to do.=20

-- Stephan