[AFS3-std] Hackathon Summary
Simon Wilkinson
sxw@inf.ed.ac.uk
Fri, 9 Oct 2009 15:12:14 +0100
A group of us - Jeffery Altman, Matt Benjamin, Derrick Brashear,
Alistair Ferguson, Christof Hanke, Tom Keiser, Hartmut Reuter, Marcus
Watts, Rod Widdowson and myself met for 3 days in Edinburgh at the end
of last month. We discussed a wide variety of AFS issues. A jabber
chat log of our discussions is available at http://conference.openafs.org/hackathon@conference.openafs.org/
(see the 2009-09-22 and 2009-09-24 files logs)
I'm not going to attempt to summarise our discussions in much detail.
However, I have noted below the topics we discussed, and any
conclusions that I believe we reached. Where we identified the next
steps to be taken, I've also noted this in the hope that we can keep
things moving forwards.
Extended Callbacks
------------------
Matt presented his extended callbacks draft. The discussion ranged
between protocol, implementation and code management issues. On the
protocol front, it was felt that the draft was approaching consensus,
but that concerns remained around the changes to the behaviour of
callback breaks (in particular, whether an RPC can return before all
callbacks have been broken). Given the lack of consensus on this
topic, we agreed that the draft would drop mention of callback
coalescing entirely. Other issues included the behaviour of clients
which receive extended
callbacks over untrusted channels, and the risk of deploying extended
callbacks on servers which only have a small number configured. Matt
will produce a draft addressing these issues, and then we will attempt
to move forwards with a consensus call.
We will attempt to address the asynchrony issue at a later date. Given
that this change is arguably a modification to afs3 semantics, we'll
attempt to engage a wider body of the community in this discussion
In discussing the implementation, we considered the range of xcb
dependencies, in particular those on mcas and libosi. It was felt that
blocking extended callbacks on getting libosi into the tree was
undesirable - in particular, a desire was expressed for the xcb code
to use the existing pthread implementation, rather than pulling in
osi's thread abstractions. We agreed that we would not enable, or
expose the callback coalescing code in the OpenAFS implementation,
pending further discussion of this issue. We didn't resolve the issue
of whether we should use MCAS's native atomics everywhere, or whether
we should prefer atomic operations provided by the operating system.
Code management concerns were expressed on a number of occasions
throughout the meeting. I'll summarise these in a section of their own
towards the end.
We agreed the following:
Matt will publish a new version of the draft which:
1) Removes any mention asynchronous behaviour for callback breaks
2) Extends the security considerations section to state that if the
client receives an XCB for metadata on an untrusted connection, it
should treat it as a normal callback break.
3) Adds an implementation note on the risks for servers with a small
number of callbacks
Simon (in the absence of a chair for afs3-stds) will issue a call for
consensus on the updated
draft.
Matt will then:
1) Remove as many OSI dependencies from xcb implementation as possible
2) Remove asynchronous callback breaks from the visible implementation
(off by default, no switch to enable)
3) Push changes to gerrit (separate patches for Windows CM, Unix CM,
and fileserver)
All will then take time to review these changes.
rx/osd
------
Hartmut and Christof presented their protocol documentation for rx/
osd, available online at http://pfanne.rzg.mpg.de/trac/openAFS-OSD/wiki/Specs
A desire was expressed to not use an IP address to identify OSD
servers, but use a UUID instead, and register OSDs in an extended vldb
which also knows about ports. We decided that this could be fixed in a
later protocol version, but that for now servers should be expressed
as a union of IP address and UUID so we don't have to rev the RPC
later on.
Where 'expires' is used in the structures it will become an absolute
time, and become a 64 bit value.
After a discussion of the consistency issues in the current mirroring
implementation (if some mirrors go offline, then you can end up with
multiple OSDs with different versions of the data), we decided that
mirroring would be out of scope for the initial rxosd integration.
UUID will be removed from getOSDLocation, StartAsyncFetch and
StartASyncStore, as the filserver already knows the UUID (from
connection establishment)
Times in general will become 64bits
Hartmut and Christof will publish a revised set of protocol
specifications addressing these, and will continue to split out the
code into chunks and submit them to gerrit.
RPCRefresh
----------
This was roughly split into topic headings as follows. Simon agreed to
edit a document proposing these changes.
UUIDs
We had a general discussion of how UUIDs might fit into the AFS
protocol, beginning with an expressed desire to include client UUIDS
in every call, to minimise the issues with using IP addresses to
locate clients. After discussion, this approach was rejected, because
client information needs to be available before an RPC has been
decoded. Instead, we proposed making our new security classes exchange
UUID information as part of the challenge/response connection
establishment. Tom will specify and develop a new 'clear' class which
will exchange UUIDs, and be a drop in replacement for the current null
class.
We then discussed the issue where, through a race condition when
servers change IP addresses, an RPC may arrive at a server other than
the one it is destined for and, in some rare situations, mutate data
incorrectly. It was felt that this was too rare an occurrence to
justify adding a server UUID to every data mutating RPC. Tom's new
clear class could also be used to address this case.
Jeffrey discussed ways in which we could use UUIDs (as SIDs) in the
ptserver. We agreed that this was out of scope for this round, but
that it was an interesting topic. Jeffrey will write up a proposal.
64bit time
We agreed to change all of the time occurences in AFS RPCS (but not
any on disk occurences) to be 64bit, with a granularity of 100ns,
RXOSD changes
We'd like to be able to specify a quota value for the number of files
in a volume (rxosd already implements this, but currently does so by
using a 'spare' field) - this is particularly relevant for sites which
are using tape storage. This changes VolIntInfo and those RPCS which
use it in the volint family, and Fetch/StoreVolumeStatus. Christof
will provide a detailed list, and suitable language.
We will rename 'ResidencyMask' to 'DataAccessProtocol' in a revised
version of FetchStatus.
In VolIntInfo we want to add afs_uint32 as 'osdPolicy'
Future proofing FIDs
We agreed to change volumeID, vnode and uniquifier to all be 64 bit
values, with 0xffffffff and 0 being reserved.
Quotas / Block size
Fields which report quotas and volume block sizes should become 64
bit, even if we can't use them all now. This affects Store/
FetchVolumeStatus, VolIntInfo and VolIntXinfo
Last update time of volume
We'll add a field to FetchStatus to store the last update time of the
volume, so it can be used to optimise handling of read only volumes in
the cache manager. Alistair will arrange for this to be implemented
Per file ACLs
We'll define semantics for the new FetchACL and StoreACL commands when
they are invoked on files
The new FetchStatus will be defined as returning per file ACL
information.
ACL Extensions
We'll extend ACLs to use 32bits of access data on the wire, and
reserve all of the new 16 bits for our own use.
FetchStatus cleanups
*) InterfaceVersion will be removed
*) Length and Length_hi will be combined into a 64bit length
*) Dataversion and dataVersionHigh will be a single 64bit value.
*) User and Group ID will become 64bit
*) ParentVnode and ParentUnique will become 64bit (inline with the FID
changes being made elsewhere)
*) SyncCounter will be removed
Tom: Propose new clear rx security class
Jeffrey: Write proposal for using SIDs within pts
Jeffrey: Provide langauge for 64bit time
Christof: Provide language for file quotas
Simon: Edit this into a manageable whole
Ali: Arrange for code to make use of the 'volume last update time'
field to be written.
SRV records
-----------
Use them to replace AFSDB - standardise supporting for vlserver and
ptserver _afs3-vlserver._udp.<cellname>, SRV priority matches to rank,
weight should be used as input to the server selection randomisation
function.
Jeffrey will write a short I-D describing how AFS uses SRV records.
rxk5
----
Marcus presented his rxk5 document - /afs/umich.edu/group/itd/build/
mdw/openafs/patches/rxk5-1.pdf
A lively discussion ensued. In particular, we discussed the initial
packet problem at great length. This is where a client sends a packet
containing valuable data to a server which only wants to accept
encrypted connections. However, because the server tells the client
that after it has received the first packet, the client may have just
sent that data in cleartext. Jeffrey proposed a solution to this
problem, which Marcus was unconvinced by. Further discussion is
required.
We also debated the merits of using our locally developed k5ssl,
against an externally maintained crypto library. Given that Heimdal's
hcrypto is likely to be imported into the OpenAFS tree to support
other AFS uses of crypto, the opinion was expressed that rxk5 should
probably be built upon that.
A discussion of the problems of ubik's hard coded assumption that
there will only ever be 3 security classes took place. It was agreed
that a new, dynamic, interface will be defined to handle this.
We discussed Marcus's new cache manager properties list, which
provides a sysctl like mechanism for exchanging configuration strings
between cache manager and user space. The meeting was unable to reach
consensus on this design, and we agreed that it should be discussed
further on list.
Other agreed changes were:
*) The authenticator will be extended to support more than 4 calls per
connection
*) Space will be added for an application level binding (AFS wants
this to assert the client UUID, but we want to make it generic)
*) rkx5 will be modfied to use the Kerberos PRF+, rather than MD5
*) The cellname length in the new tokens pioctl will be extended to 256
Matt & Marcus: Update draft to reflect changes, break code into chunks
and submit to gerrit
Marcus: Raise sysctl-style properties interface on openafs-devel
Simon: Import hcrypto into OpenAFS tree (as part of the rxgk work)
Generic Quotas
--------------
Christof had raised the issue of providing a more generic quota
mechanism, which allows more
flexible definitions of what quota might be (rxosd would like to be
able to apply a separate quota to files under a certain size, for
example)
We discussed implementing this as a set of tag value pairs, with each
pair having a globally defined meaning. Individual tags need not be
implemented on every fileserver - there should be an RPC by which
clients can determine which tags a fileserver supports. We want to
implement this by revising existing RPCs which take quota values, and
use it to replace the quota values that those RPCs already contain.
Christof will write a document describing this, but we won't block the
RPC refresh on it
Volume State
------------
Tom wants to be able to communicate to the client the type of
fileserver its talking to, and provide a 'raw' and a 'mapped'
indicator of the volume status. (raw is implementation dependent,
mapped uses globally defined error codes). We agreed that fileserver
type could, for now, be expressed as a capability bit, and that the
volume state fields should be new parameters within VolIntInfo
Tom will write an I-D describing this. Again, we won't hold RPC
refresh up for these changes
rx/udp improvements
-------------------
Jeffrey discussed changes he is making to RTT calculations such that
the algorithms better reflect Phil Karn's findings from 1987. This
seemed uncontentious - Jeffrey will put a patch into gerrit.
Derrick discussed larger window support, which he will test, and
discuss his findings further
Derrick discussed improvements he wants to make to RX negotation, by
adding elements to the existing negotiation packet. In theory this
should be backwards compatible, because existing clients use the
packet size to determine the version of the structure they are
receiving. We discussed the mechanism for progressing RX
modifications, as there isn't an obvious body to do it in. The
conclusion was to use afs3-stds, but make a deliberate attempt to
reach out to those people who we know are using RX in other
applications.
Derrick will write up an I-D describing this, and solicit feedback
rx/gk
-----
Simon presented his write-up of the current rx/gk protocol
Jeffrey expressed a desire for the first packet problem to be
resolved, and for client uuids to be a part of the authenticator.
We had a long discussion of bytelife, and of key agility. The
conlusion that was reached was that each packet should have a security
header containing a key number that was used to encrypt that packet.
Providing this key number is input to the PRF used to derive the
transport key, keys can be revised at either the client, or server's,
request. bytelife will remain advisory, however.
We decided that the ivec should be determined from the pseudo header,
to solve the packet ordering problem.
We decided that the same pseudo header should be used as rxk5
Marcus pointed out that rxgk uses the first version of the rxkad
authenticator, and strongly suggested that CITI's recommendations that
lead to the rxkadv2 authenticator be studied and followed.
Marcus noted that using different numbers than rxkad for security
levels will only cause implementation pain. We'll use the same as rxkad.
Simon will update the rxgk protocol document, and create a new one
which describes its implementation within AFS.
miniosi
-------
Tom talked us through the current libosi, with a view to splitting it
into chunks that we can start to pull into the tree.
We agreed that we will only pull in changes that are going to be used
in the code, and import
libosi as follows:
Phase 0: A build framework
Phase 1: buildenv, compiler, types
Phase 2: platform/datamodel.h
Phase 3: Time
Each of these phases should update the rest of the tree to use the new
functionality, so that
we don't, for example, end up with two descriptions of each platforms
data model in the tree.
This is on the critical path for extended callbacks, so Tom and Matt
will seek to move this forwards, with Tom doing the work and Matt
making sure that it gets done.
Tom: Submit these chunks of libosi, and tree integration patches
Matt: nag Tom
Directory RPCs
--------------
http://michigan-openafs-lists.central.org/archives/afs3-standardization/2009September/000423.html
We discussed Matt's proposal for an explicit directory listing RPC
(posted to afs3-stds during the hackathon). We were unable to achieve
consensus on any of the issues this presents, beyond determining that
server side sorting was unachievable.
DAFS
----
Tom would like a way of a server changing its advertised capabilities,
without having to do an InitCallbackState3. Jeffrey proposed a new
TellMeAboutYourself RPC which will take the capabilities as an IN
parameter.
Tom will go away and think about this, and write something up.
There was agreement that DAFS (including the changes to the vnode and
volume packages) needed to be properly documented. Tom and Ali will
get this done.
PRDB extensions
---------------
Derrick presented the original Swedish Hackathon work:
http://web.archive.org/web/20060211111127/http://www.afsig.se/snipsnap/space/prdb+extensions
We agreed that we need a way of creating multiple names in a single
RPC, and provide a RenameAuthName RPC which takes a vector of triples
of (type, old_opaque, new_opaque)
Derrick will produce an I-D documenting the new RPCs, and an
implementation.
Code management
---------------
We spent a lot of time discussing issues of code management, and how
large changes can get into the OpenAFS source code. It's hard to
summarise what was a wide ranging and contradictory discussion, but
the following points were made and broadly agreed with
*) Both clear protocol, and implementation documentation can hugely
help with code review
*) Design and implementation discussions, during the development
process, are hugely valuable
*) For 'large' projects we will merge by creating a git branch onto
which changes will be reviewed. We can the flip the switch by doing a
final merge commit, safe in the knowledge that the individual commits
have already been reviewed.