[OpenAFS-devel] developer takeaways from EAKC 2014

Benjamin Kaduk kaduk@MIT.EDU
Mon, 31 Mar 2014 10:48:52 -0400 (EDT)


Hi all,

I was attempting to take notes about things that we could do as developers 
to improve the experience for users and administrators, based on the 
feedback we were getting from the various talks at the conference.  I'll 
post them here so they are better archived and reach a broader audience.

I think there were a couple of sites that indicated they had tuned the 
client cache size and chunk size away from the defaults, and got noticably 
improved performance.  I seem to recall that our defaults are mostly 
unchanged from when they were created a long time ago, for hardware 
several generations removed from the current state of affairs.  Revisiting 
the defaults seems reasonable.

Arne noted that creating a user with id 32766 (ANONYMOUSID) results in 
great confusion.  I have pushed gerrit 10950, which requires a little 
tweaking, and I will follow up there.

CERN has a "big loop" that will have the client retry the vldb lookup if a 
volume has been "moved behind its back" (due to the storage and uuid being 
reconnected to a different file server on a different IP).  It looks like 
this is already in gerrit, as 10858 if I am reading things correctly.

CERN also has a patch that does per-volume throttling, in my notes.  I am 
less sure I accurately remember what this one was doing, but I think it 
was limiting the number of threads servicing requests for a given volume, 
so that other ("normal") volume accesses were not affected by a single 
user thrashing one volume.

Simon's talk mentioned that rx hot threads are actively harmful, and Mike 
pushed gerrit 10957 to disable them in the fileserver during the meeting, 
which got merged to master (yay!).

Simon also talked about how there were three "classes" of VL_ queries: 
'O', 'N', and 'U'; the vos(1) client has not been updated to use the 'U' 
family.  This is probably not terribly urgent, if I understand correctly, 
but could perhaps go on a "simple jobs" wiki page.

Speaking of that "simple jobs" wiki page 
(http://wiki.openafs.org/OpenAFSSimpleJobs/), it sounded like most (all?) 
of the items there have been completed, so we should update it with more 
things that could help get new contributors familiar with things.

A few of us started looking at some linux panics due to stack overflow 
(e.g., RT 131831) during one of the sessions.  We have a number of 
routines that use more stack than they ought to.  Chas posted a script to 
look at a build tree and pull out stack usage for the various functions, 
to gerrit 10881.  I came away with the impression that Mike was going to 
do some more work and submit patches to remove some more stack usage; 
Mike, can you confirm this?

During the gatekeepers open session, I pondered whether we could do more 
to make the process of setting up a new development environment more 
streamlined for new contributors (I gave an example of having the gerrit 
change-ID script be in the repo instead of something scp'd from gerrit), 
but a few people in the room who were in that position said that they 
didn't mind the process as-is.

Simon's performance talk gave some examples of things that can be done to 
improve performance in (e.g.) the rx stack.  We've known that there are 
issues here for a long time, but haven't really gottent to do much in the 
area.

We also heard some ideas about what steps to take towards (at least 
partial) IPv6 support.

At various points (at least some of which were "hallway conversations") we 
talked about the testing framework, and how it would be nice to get more 
things covered.  I pondered whether it would be worth having a script that 
could start up a test cell (servers), if run as root (and bail out early 
if unprivileged).

There are probably more things that didn't make it into my notes.  If you 
remember any, please chime in.

Thanks,

Ben