[AFS3-std] Hackathon Summary

Simon Wilkinson sxw@inf.ed.ac.uk
Fri, 9 Oct 2009 15:12:14 +0100


A group of us - Jeffery Altman, Matt Benjamin, Derrick Brashear,  
Alistair Ferguson, Christof Hanke, Tom Keiser, Hartmut Reuter, Marcus  
Watts, Rod Widdowson and myself met for 3 days in Edinburgh at the end  
of last month. We discussed a wide variety of AFS issues. A jabber  
chat log of our discussions is available at http://conference.openafs.org/hackathon@conference.openafs.org/ 
  (see the 2009-09-22 and 2009-09-24 files logs)

I'm not going to attempt to summarise our discussions in much detail.  
However, I have noted below the topics we discussed, and any  
conclusions that I believe we reached. Where we identified the next  
steps to be taken, I've also noted this in the hope that we can keep  
things moving forwards.

Extended Callbacks
------------------

Matt presented his extended callbacks draft. The discussion ranged  
between protocol, implementation and code management issues. On the  
protocol front, it was felt that the draft was approaching consensus,  
but that concerns remained around the changes to the behaviour of  
callback breaks (in particular, whether an RPC can return before all  
callbacks have been broken). Given the lack of consensus on this  
topic, we agreed that the draft would drop mention of callback  
coalescing entirely. Other issues included the behaviour of clients  
which receive extended
callbacks over untrusted channels, and the risk of deploying extended  
callbacks on servers which only have a small number configured. Matt  
will produce a draft addressing these issues, and then we will attempt  
to move forwards with a consensus call.

We will attempt to address the asynchrony issue at a later date. Given  
that this change is arguably a modification to afs3 semantics, we'll  
attempt to engage a wider body of the community in this discussion

In discussing the implementation, we considered the range of xcb  
dependencies, in particular those on mcas and libosi. It was felt that  
blocking extended callbacks on getting libosi into the tree was  
undesirable - in particular, a desire was expressed for the xcb code  
to use the existing pthread implementation, rather than pulling in  
osi's thread abstractions. We agreed that we would not enable, or  
expose the callback coalescing code in the OpenAFS implementation,  
pending further discussion of this issue. We didn't resolve the issue  
of whether we should use MCAS's native atomics everywhere, or whether  
we should prefer atomic operations provided by the operating system.

Code management concerns were expressed on a number of occasions  
throughout the meeting. I'll summarise these in a section of their own  
towards the end.

We agreed the following:

Matt will publish a new version of the draft which:
1) Removes any mention asynchronous behaviour for callback breaks
2) Extends the security considerations section to state that if the  
client receives an XCB for metadata on an untrusted connection, it  
should treat it as a normal callback break.
3) Adds an implementation note on the risks for servers with a small  
number of callbacks

Simon (in the absence of a chair for afs3-stds) will issue a call for  
consensus on the updated
draft.

Matt will then:
1) Remove as many OSI dependencies from xcb implementation as possible
2) Remove asynchronous callback breaks from the visible implementation  
(off by default, no switch to enable)
3) Push changes to gerrit (separate patches for Windows CM, Unix CM,  
and fileserver)

All will then take time to review these changes.

rx/osd
------

Hartmut and Christof presented their protocol documentation for rx/ 
osd, available online at http://pfanne.rzg.mpg.de/trac/openAFS-OSD/wiki/Specs

A desire was expressed to not use an IP address to identify OSD  
servers, but use a UUID instead, and register OSDs in an extended vldb  
which also knows about ports. We decided that this could be fixed in a  
later protocol version, but that for now servers should be expressed  
as a union of IP address and UUID so we don't have to rev the RPC  
later on.

Where 'expires' is used in the structures it will become an absolute  
time, and become a 64 bit value.

After a discussion of the consistency issues in the current mirroring  
implementation (if some mirrors go offline, then you can end up with  
multiple OSDs with different versions of the data), we decided that  
mirroring would be out of scope for the initial rxosd integration.

UUID will be removed from getOSDLocation, StartAsyncFetch and  
StartASyncStore, as the filserver already knows the UUID (from  
connection establishment)

Times in general will become 64bits

Hartmut and Christof will publish a revised set of protocol  
specifications addressing these, and will continue to split out the  
code into chunks and submit them to gerrit.

RPCRefresh
----------

This was roughly split into topic headings as follows. Simon agreed to  
edit a document proposing these changes.

UUIDs
We had a general discussion of how UUIDs might fit into the AFS  
protocol, beginning with an expressed desire to include client UUIDS  
in every call, to minimise the issues with using IP addresses to  
locate clients. After discussion, this approach was rejected, because  
client information needs to be available before an RPC has been  
decoded. Instead, we proposed making our new security classes exchange  
UUID information as part of the challenge/response connection  
establishment. Tom will specify and develop a new 'clear' class which  
will exchange UUIDs, and be a drop in replacement for the current null  
class.

We then discussed the issue where, through a race condition when  
servers change IP addresses, an RPC may arrive at a server other than  
the one it is destined for and, in some rare situations, mutate data  
incorrectly. It was felt that this was too rare an occurrence to  
justify adding a server UUID to every data mutating RPC. Tom's new  
clear class could also be used to address this case.

Jeffrey discussed ways in which we could use UUIDs (as SIDs) in the  
ptserver. We agreed that this was out of scope for this round, but  
that it was an interesting topic. Jeffrey will write up a proposal.

64bit time
We agreed to change all of the time occurences in AFS RPCS (but not  
any on disk occurences) to be 64bit, with a granularity of 100ns,

RXOSD changes
We'd like to be able to specify a quota value for the number of files  
in a volume (rxosd already implements this, but currently does so by  
using a 'spare' field) - this is particularly relevant for sites which  
are using tape storage. This changes VolIntInfo and those RPCS which  
use it in the volint family, and Fetch/StoreVolumeStatus. Christof  
will provide a detailed list, and suitable language.

We will rename 'ResidencyMask' to 'DataAccessProtocol' in a revised  
version of FetchStatus.

In VolIntInfo we want to add afs_uint32 as 'osdPolicy'

Future proofing FIDs
We agreed to change volumeID, vnode and uniquifier to all be 64 bit  
values, with 0xffffffff and 0 being reserved.

Quotas / Block size
Fields which report quotas and volume block sizes should become 64  
bit, even if we can't use them all now. This affects Store/ 
FetchVolumeStatus, VolIntInfo and VolIntXinfo

Last update time of volume
We'll add a field to FetchStatus to store the last update time of the  
volume, so it can be used to optimise handling of read only volumes in  
the cache manager. Alistair will arrange for this to be implemented

Per file ACLs
We'll define semantics for the new FetchACL and StoreACL commands when  
they are invoked on files
The new FetchStatus will be defined as returning per file ACL  
information.

ACL Extensions
We'll extend ACLs to use 32bits of access data on the wire, and  
reserve all of the new 16 bits for our own use.

FetchStatus cleanups
*) InterfaceVersion will be removed
*) Length and Length_hi will be combined into a 64bit length
*) Dataversion and dataVersionHigh will be a single 64bit value.
*) User and Group ID will become 64bit
*) ParentVnode and ParentUnique will become 64bit (inline with the FID  
changes being made elsewhere)
*) SyncCounter will be removed

Tom: Propose new clear rx security class
Jeffrey: Write proposal for using SIDs within pts
Jeffrey: Provide langauge for 64bit time
Christof: Provide language for file quotas
Simon: Edit this into a manageable whole
Ali: Arrange for code to make use of the 'volume last update time'  
field to be written.

SRV records
-----------

Use them to replace AFSDB - standardise supporting for vlserver and  
ptserver _afs3-vlserver._udp.<cellname>, SRV priority matches to rank,  
weight should be used as input to the server selection randomisation  
function.

Jeffrey will write a short I-D describing how AFS uses SRV records.

rxk5
----

Marcus presented his rxk5 document - /afs/umich.edu/group/itd/build/ 
mdw/openafs/patches/rxk5-1.pdf

A lively discussion ensued. In particular, we discussed the initial  
packet problem at great length. This is where a client sends a packet  
containing valuable data to a server which only wants to accept  
encrypted connections. However, because the server tells the client  
that after it has received the first packet, the client may have just  
sent that data in cleartext. Jeffrey proposed a solution to this  
problem, which Marcus was unconvinced by. Further discussion is  
required.

We also debated the merits of using our locally developed k5ssl,  
against an externally maintained crypto library. Given that Heimdal's  
hcrypto is likely to be imported into the OpenAFS tree to support  
other AFS uses of crypto, the opinion was expressed that rxk5 should  
probably be built upon that.

A discussion of the problems of ubik's hard coded assumption that  
there will only ever be 3 security classes took place. It was agreed  
that a new, dynamic, interface will be defined to handle this.

We discussed Marcus's new cache manager properties list, which  
provides a sysctl like mechanism for exchanging configuration strings  
between cache manager and user space. The meeting was unable to reach  
consensus on this design, and we agreed that it should be discussed  
further on list.

Other agreed changes were:
*) The authenticator will be extended to support more than 4 calls per  
connection
*) Space will be added for an application level binding (AFS wants  
this to assert the client UUID, but we want to make it generic)
*) rkx5 will be modfied to use the Kerberos PRF+, rather than MD5
*) The cellname length in the new tokens pioctl will be extended to 256

Matt & Marcus: Update draft to reflect changes, break code into chunks  
and submit to gerrit
Marcus: Raise sysctl-style properties interface on openafs-devel
Simon: Import hcrypto into OpenAFS tree (as part of the rxgk work)

Generic Quotas
--------------

Christof had raised the issue of providing a more generic quota  
mechanism, which allows more
flexible definitions of what quota might be (rxosd would like to be  
able to apply a separate quota to files under a certain size, for  
example)

We discussed implementing this as a set of tag value pairs, with each  
pair having a globally defined meaning. Individual tags need not be  
implemented on every fileserver - there should be an RPC by which  
clients can determine which tags a fileserver supports. We want to  
implement this by revising existing RPCs which take quota values, and  
use it to replace the quota values that those RPCs already contain.

Christof will write a document describing this, but we won't block the  
RPC refresh on it

Volume State
------------

Tom wants to be able to communicate to the client the type of  
fileserver its talking to, and provide a 'raw' and a 'mapped'  
indicator of the volume status. (raw is implementation dependent,  
mapped uses globally defined error codes). We agreed that fileserver  
type could, for now, be expressed as a capability bit, and that the  
volume state fields should be new parameters within VolIntInfo

Tom will write an I-D describing this. Again, we won't hold RPC  
refresh up for these changes

rx/udp improvements
-------------------

Jeffrey discussed changes he is making to RTT calculations such that  
the algorithms better reflect Phil Karn's findings from 1987. This  
seemed uncontentious - Jeffrey will put a patch into gerrit.

Derrick discussed larger window support, which he will test, and  
discuss his findings further

Derrick discussed improvements he wants to make to RX negotation, by  
adding elements to the existing negotiation packet. In theory this  
should be backwards compatible, because existing clients use the  
packet size to determine the version of the structure they are  
receiving. We discussed the mechanism for progressing RX  
modifications, as there isn't an obvious body to do it in. The  
conclusion was to use afs3-stds, but make a deliberate attempt to  
reach out to those people who we know are using RX in other  
applications.

Derrick will write up an I-D describing this, and solicit feedback

rx/gk
-----

Simon presented his write-up of the current rx/gk protocol

Jeffrey expressed a desire for the first packet problem to be  
resolved, and for client uuids to be a part of the authenticator.

We had a long discussion of bytelife, and of key agility. The  
conlusion that was reached was that each packet should have a security  
header containing a key number that was used to encrypt that packet.  
Providing this key number is input to the PRF used to derive the  
transport key, keys can be revised at either the client, or server's,  
request. bytelife will remain advisory, however.

We decided that the ivec should be determined from the pseudo header,  
to solve the packet ordering problem.

We decided that the same pseudo header should be used as rxk5

Marcus pointed out that rxgk uses the first version of the rxkad  
authenticator, and strongly suggested that CITI's recommendations that  
lead to the rxkadv2 authenticator be studied and followed.

Marcus noted that using different numbers than rxkad for security  
levels will only cause implementation pain. We'll use the same as rxkad.

Simon will update the rxgk protocol document, and create a new one  
which describes its implementation within AFS.

miniosi
-------

Tom talked us through the current libosi, with a view to splitting it  
into chunks that we can start to pull into the tree.

We agreed that we will only pull in changes that are going to be used  
in the code, and import
libosi as follows:

Phase 0: A build framework
Phase 1: buildenv, compiler, types
Phase 2: platform/datamodel.h
Phase 3: Time

Each of these phases should update the rest of the tree to use the new  
functionality, so that
we don't, for example, end up with two descriptions of each platforms  
data model in the tree.

This is on the critical path for extended callbacks, so Tom and Matt  
will seek to move this forwards, with Tom doing the work and Matt  
making sure that it gets done.

Tom: Submit these chunks of libosi, and tree integration patches
Matt: nag Tom

Directory RPCs
--------------
http://michigan-openafs-lists.central.org/archives/afs3-standardization/2009September/000423.html

We discussed Matt's proposal for an explicit directory listing RPC  
(posted to afs3-stds during the hackathon). We were unable to achieve  
consensus on any of the issues this presents, beyond determining that  
server side sorting was unachievable.

DAFS
----

Tom would like a way of a server changing its advertised capabilities,  
without having to do an InitCallbackState3. Jeffrey proposed a new  
TellMeAboutYourself RPC which will take the capabilities as an IN  
parameter.

Tom will go away and think about this, and write something up.

There was agreement that DAFS (including the changes to the vnode and  
volume packages) needed to be properly documented.  Tom and Ali will  
get this done.

PRDB extensions
---------------

Derrick presented the original Swedish Hackathon work:
http://web.archive.org/web/20060211111127/http://www.afsig.se/snipsnap/space/prdb+extensions
We agreed that we need a way of creating multiple names in a single  
RPC, and provide a RenameAuthName RPC which takes a vector of triples  
of (type, old_opaque, new_opaque)

Derrick will produce an I-D documenting the new RPCs, and an  
implementation.

Code management
---------------

We spent a lot of time discussing issues of code management, and how  
large changes can get into the OpenAFS source code. It's hard to  
summarise what was a wide ranging and contradictory discussion, but  
the following points were made and broadly agreed with
*) Both clear protocol, and implementation documentation can hugely  
help with code review
*) Design and implementation discussions, during the development  
process, are hugely valuable
*) For 'large' projects we will merge by creating a git branch onto  
which changes will be reviewed. We can the flip the switch by doing a  
final merge commit, safe in the knowledge that the individual commits  
have already been reviewed.