[OpenAFS-devel] developer takeaways from EAKC 2014
Benjamin Kaduk
kaduk@MIT.EDU
Mon, 31 Mar 2014 10:48:52 -0400 (EDT)
Hi all,
I was attempting to take notes about things that we could do as developers
to improve the experience for users and administrators, based on the
feedback we were getting from the various talks at the conference. I'll
post them here so they are better archived and reach a broader audience.
I think there were a couple of sites that indicated they had tuned the
client cache size and chunk size away from the defaults, and got noticably
improved performance. I seem to recall that our defaults are mostly
unchanged from when they were created a long time ago, for hardware
several generations removed from the current state of affairs. Revisiting
the defaults seems reasonable.
Arne noted that creating a user with id 32766 (ANONYMOUSID) results in
great confusion. I have pushed gerrit 10950, which requires a little
tweaking, and I will follow up there.
CERN has a "big loop" that will have the client retry the vldb lookup if a
volume has been "moved behind its back" (due to the storage and uuid being
reconnected to a different file server on a different IP). It looks like
this is already in gerrit, as 10858 if I am reading things correctly.
CERN also has a patch that does per-volume throttling, in my notes. I am
less sure I accurately remember what this one was doing, but I think it
was limiting the number of threads servicing requests for a given volume,
so that other ("normal") volume accesses were not affected by a single
user thrashing one volume.
Simon's talk mentioned that rx hot threads are actively harmful, and Mike
pushed gerrit 10957 to disable them in the fileserver during the meeting,
which got merged to master (yay!).
Simon also talked about how there were three "classes" of VL_ queries:
'O', 'N', and 'U'; the vos(1) client has not been updated to use the 'U'
family. This is probably not terribly urgent, if I understand correctly,
but could perhaps go on a "simple jobs" wiki page.
Speaking of that "simple jobs" wiki page
(http://wiki.openafs.org/OpenAFSSimpleJobs/), it sounded like most (all?)
of the items there have been completed, so we should update it with more
things that could help get new contributors familiar with things.
A few of us started looking at some linux panics due to stack overflow
(e.g., RT 131831) during one of the sessions. We have a number of
routines that use more stack than they ought to. Chas posted a script to
look at a build tree and pull out stack usage for the various functions,
to gerrit 10881. I came away with the impression that Mike was going to
do some more work and submit patches to remove some more stack usage;
Mike, can you confirm this?
During the gatekeepers open session, I pondered whether we could do more
to make the process of setting up a new development environment more
streamlined for new contributors (I gave an example of having the gerrit
change-ID script be in the repo instead of something scp'd from gerrit),
but a few people in the room who were in that position said that they
didn't mind the process as-is.
Simon's performance talk gave some examples of things that can be done to
improve performance in (e.g.) the rx stack. We've known that there are
issues here for a long time, but haven't really gottent to do much in the
area.
We also heard some ideas about what steps to take towards (at least
partial) IPv6 support.
At various points (at least some of which were "hallway conversations") we
talked about the testing framework, and how it would be nice to get more
things covered. I pondered whether it would be worth having a script that
could start up a test cell (servers), if run as root (and bail out early
if unprivileged).
There are probably more things that didn't make it into my notes. If you
remember any, please chime in.
Thanks,
Ben