[OpenAFS] Re: [AFS3-std] Re: IBM will not re-license OpenAFS .xg files

Thu, 30 Aug 2012 18:44:37 -0700

Jeffrey Altman <jaltman@your-file-system.com> writes:

> For any given protocol standardization proposal, substantial expertise
> and time is required to perform proper analysis and review let alone
> write a document in the first place.  On top of that it is very
> difficult to anticipate all of the needs of a protocol and its viability
> without implementation experience.  To ensure that the documentation of
> a protocol is correct, multiple independent implementations are
> required.

I want to highlight this.  The primary purpose of protocol specifications
is to enable interoperability between multiple implementations of the
protocol.  There are other benefits, of course, such as documentation for
what guarantees are being made, which help the future maintainability of
the code, but those can be addressed in other ways than full
standardization process and third-party review.  The primary benefit that
third-party review and a full standards process adds is that it ensures
that the protocol specification is sufficiently complete that someone can
write an interoperating implementation from nothing but the protocol
specification.

Given unlimited resources, I absolutely think we should write protocol
specifications.  I love having firm protocol specifications.  I tend to
write them even for software for which there's only one implementation
(remctl, originally, or WebAuth).  They offer substantial benefits for
development and aid in thinking through all the complexities of a problem.

But AFS is currently operating in a seriously resource-starved mode.  I
completely understand why people have put those resources into things
other than detailed protocol documentation and review, particularly since
there is a huge backlog of work that would be required to achieve a
fully-documented protocol.  It's very similar to the reason why we don't
have a completely updated manual, except that protocol work is even
harder than user documentation.

> The fact is that the first three categories have rarely provided the
> resources necessary to produce substantial change.  Individuals unless
> they are independently wealthy simply do not have the time or resources.
> End user organizations are willing to fund small focused projects that
> can be achieved on the order of months not years.  End user
> organizations are unwilling to fund partial work that is reliant upon
> other organizations to independently fund subsequent segments.

When this comes up, folks from time to time observe that other open source
projects manage under those circumstances.  I want to be sure that people
realize the level of complexity that's involved in OpenAFS, and why
comparisons to other open source projects don't always work.

Maintaining OpenAFS involves, among other things:

* Kernel code as well as userspace code for various UNIXes.
* Mac OS X development (with quite a bit of OS internals involvement).
* Windows kernel file system development.
* High-performance threaded code with a complex lock model.
* A custom network protocol with substantial complexity.
* Cryptographic network security models.

Some of those things (such as the Windows kernel file system work) no
other project, open source *or* commercial, does at the level that OpenAFS
does.  This is a level of complexity *far* beyond the typical open source
project.  The only open source projects I can think of with equivalent
complexity are primarily maintained by full-time, professional developers
whose job is to work on that software, and whose salaries are paid by
companies like Red Hat, Novell, IBM, Google, or Oracle.

I love volunteer open source, and open source that's produced as an
incidental side effect of solving problems for one's regular job.  I
maintain a bunch of it myself.  OpenAFS is not at the level of complexity
that can be maintained in that model.  That degree of investment might be
enough to keep it basically running (although I'm dubious about even that
for Mac OS X and Windows), but not enough to do any substantial new
development at any sort of reasonable rate.

> I am stating this here and now because it has a direct impact on the
> availability of resources to the AFS3 standardization process.  Given
> the status quo it will be very difficult for YFSI to make its resources
> available to the AFS3 standardization process until after it has
> established it product on the market.  As Harald pointed out in this
> thread and others have at various times on openafs-info@openafs.org, if
> certain critical features are not made available to the end user
> community real soon now, large organizations that require that
> functionality will have no choice but to move their IT services off of
> AFS3 to another file system protocol.  Many already have and others are
> known to be evaluating their options.

Let me give folks a bit of a snapshot of the situation at Stanford, for
example.

The general perception among senior IT management, both centrally and in
the schools, is that the future of user storage is the cloud, with
services such as Google Drive, Box, Dropbox, and so forth.  The whole
concept of an enterprise file system is widely considered to be obsolete.
I'm quite confident that this position is shared by a lot of other people
in higher education as well.  While that's not the niche that AFS fills
everywhere, that is the most common niche that it fills in higher ed.

Back in the NFSv4 days, the discussion we had at a management level around
AFS and alternatives was in the form of "is this thing sufficiently better
than AFS that we should switch?"  That's not the conversation that we're
having any more.  The conversation we're having now is "can we use this to
get off of that dead-end AFS stuff?"

I happen to think this position is wrong, and that it would be easier to
add the UI components to AFS than it will be to solve all the file system
edge cases in cloud storage.  But I will get absolutely nowhere with that
argument.  No one is going to believe it; the UI is perceived to be very
hard, and the advantages offered by something like Google Docs to be
overwhelming.  Management will believe it when they see it; arguing about
what AFS *could* become is entirely pointless.

Where AFS *does* have a very compelling story is around security, because
security is the huge weakness of these cloud storage platforms.  If you
try to do any serious authorization management in Google Drive, for
example, you will immediately discover that their concept of security is
pure amateur hour.  It's a complete embarassment.  And auditors and IT
professionals who are responsible for storing data that has to remain
secure, such as patient medical data or financial data, know that.  For
example, the Stanford Medical School has flatly forbidden any of their
staff to use any cloud storage to store anything related to patient data.
None of the cloud storage providers will accept the sort of liability that
would be required to make the university lawyers willing to permit such a
thing (and rightfully so, given how bad their security is).

So the AFS *protocol* has a very compelling story here, one that I think I
can sell.  However, the current OpenAFS *implementation* does not.

(For the record, NFSv4 isn't any better.  NFSv4 has nearly all of the
problems of AFS, plus adds a bunch of new problems of its own.)

To tell this story, I need, *at least*:

* Full rxgk including mandatory data privacy and AES encryption.

* A coherent mobile story for how mobile devices and applications are
  going to access data in AFS, including how they can authenticate without
  using user passwords (which are increasingly a bad authentication story
  anywhere but are particularly horrible on mobile devices).

* Client-side data encryption with untrusted servers, where I can store
  data with someone I don't trust and be assured that they can't access it
  in any meaningful way.

The last is how I get AFS into the cloud storage discussion.  This is
already how cloud storage is addressing the problems that no one trusts
cloud storage providers to maintain adequate security.  There are, for
example, multiple appliances that you can buy that front-end Amazon S3 or
other cloud storage services but have encryption keys and only store
encrypted data in the cloud, re-exporting that file system to your data
center via CIFS or NFS or iSCSI.  That's the sort of thing that AFS is
competing against.  I think the AFS protocol offers considerable
advantages over that model, but it needs to offer feature parity on the
security capabilities.

I need all of this either deployable or pretty clearly in progress within
about a year.  Beyond that, I think the conversation is already going to
be over.  At the least, I'm not going to be particularly interested in
continuing to fight it.

Other people will have different lists, of course.  That's just mine.
There are things other people care about that I don't; for example, if I
made a list of the top 50 things I want to see in AFS, IPv6 support would
be #116.  Other people have different constraints.

Some sites, often due to lack of resources, are willing to stay with stuff
that isn't really working because it's already there and it's expensive to
keep it.  I know a lot of sites that are running AFS that way; some of
them made a decision to get rid of it years and years ago, but still have
it because it does do a bunch of cool stuff and no one has the heart to
really make the push to get rid of it.

I am not one of those people.  I have a deep dislike of staying on zombie
technologies.  I believe in the potential of AFS, and if AFS will go where
I need it to go to make a compelling pitch to management, then I want to
keep it.  But if AFS is not going to go there in a reasonable timeline,
then I want Stanford to start its migration off of AFS in 2013, free up
all those resources to invest in technologies that we can keep for the
long run, and wish everyone the best of luck and go our separate way by
2014 or 2015.

The thing I want to really stress here is that we're right on the border
of it being too late for AFS as a protocol to find the necessary
investment.  It is already extremely difficult to get management here to
consider funding anything related to AFS; it's perceived as a
technological dead-end and something that is no longer strategic.  There
is no longer an assumed default in favor of keeping AFS; if anything, the
assumed default is that AFS will be retired in some forseeable timeline in
favor of cloud-based collaboration environments, Amazon S3, and some
data-center-centric storage such as cluster file systems for HPC.  To get
a place like Stanford to invest in AFS requires a *proactive case*, a
justification for why it's better; the "you already have it so it's
cheaper to invest in it than replace it" argument is no longer working.

With a pace of OpenAFS development where rxgk maybe wanders into a stable
release in 2014 or so, and then possibly X.509 authentication and improved
ACLs in 2015, and a new directory format in 2016 or 2017, and so forth,
AFS is going to lose a lot of sites.  Almost definitely including
Stanford.

-- 
Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>