[OpenAFS-devel] Some low-hanging RPC fruit vs long-term needs

Steve Simmons scs@umich.edu
Thu, 16 Dec 2010 14:13:14 -0500


This was originally subject "[OpenAFS] GiveUpAllCallBacks callers." I've =
moved it over to developers and kicked off a new thread.

On Dec 14, 2010, at 6:50 PM, Simon Wilkinson wrote:

> I think that, as responsible developers and vendors, we should not be =
knowingly shipping new code which can crash previous stable releases. =
However, I also find myself agreeing with the various objections that =
have been raised to creating new capability bits, tying together =
unrelated RPCs, and replacing RPCs because of implementation faults.
>=20
> At this point, I think we should take a look at how other protocols =
deal with the problem of avoiding triggering bugs in badly written =
peers. The use of version string matching is pretty common in this area =
- witness OpenSSH's use of the protocol version information to avoid =
their client from crashing other's servers, Apache's use of header =
matching to avoid breaking non-confornmant HTTP clients, and so on.
>=20
> If we are going to have a richer AFS ecosystem, then we're going to =
have to gain the ability to deal with these problems. I think that this =
means that in the future, we're going to have to produce a new =
versioning RPC which allows the distribution of structured vendor and =
version information. However, we don't have that RPC now, and it doesn't =
help us with already deployed servers. In the short term, I think it =
would be appropriate to use the RX version identifier.

What he said. This is a problem, and solving it in any near-optimum way =
isn't going to be easy. We should address it, write RFCs as needed, =
yadda, yadda. But we should not let the perfect be the enemy of the good =
here. We're not going to have that ready as part of 1.6, maybe not 1.8. =
In the interim, I'd rather see us do a few things that I think are =
small/easy to implement but will ease the current pain.

One is a ping equivalent. The RPC suite that Sun developed for NFS, =
netinfo, etc, uses this as a core feature. It's a effective workaround =
for some limitations of UDP-based services, but it does more than just =
work around that issue. In Sun terminology (well, assuming I recall Sun =
terminology correctly) for every RPC type there is a reserved call 0 =
that's just a ping. Make that procedure call, and it should return true =
immediately.

Like a ping, when you get the response, you know you have a server =
running to provide that particular service. It doesn't even tell you =
that the service is running correctly. But it eliminates the process of =
trying to figure out if a given host *believes* it is running the =
service. It has all the limitations of ping as well, and will need a lot =
of the same options on the client side that ping clients have (max =
counts, timer setting, etc).

In Suns RPC, this is a predefined reserved part of every service. For =
the long term this may or may not be doable for our work. If it is not, =
I'd like to see it be a convention we follow for every service class - =
that there's one RPC that's essentially a ping of the service.

The version report we get from rxdebug -v is useful, but as others note, =
it can and is set by various sites to reflect their local needs. A 'core =
version' flavor that strictly follows AFS releases would be useful, eg, =
rxdebug -coreversion would only report a string the user shouldn't muck =
with. And yes, I know that git-ish things have to be considered, yadda, =
yadda. The devil is always in the details.

Another useful thing is the config reporter. Given that various features =
are determined at compile time, an RPC that reports these (think of it =
as version++) is useful. As a quick and dirty example of uses in other =
circumstances, take a look at the output of perl -V or vim --version. To =
be more afs-specific, it's be great to know by simple probe if a system =
supports supergroups, dynamic attach, etc. Even if all the conf report =
probe does is dump some plaintext as perl and vim do, it's a help.

Anyway, my general stance is that we not let the lack of a perfect =
solution get in the way of some quick but sensibly-implemented interim =
solutions. It's hard to get developer time for any of the concentrated =
long-term solutions, but people with specific itches to scratch might be =
available for the interim stuff.

Steve=