[OpenAFS] cache performance

Todd M. Lewis utoddl@email.unc.edu
Fri, 25 Oct 2002 14:20:16 -0400

Phil.Moore@morganstanley.com wrote:
>>>>>>"Todd" == Todd M Lewis <utoddl@email.unc.edu> writes:
> Todd> Just curious: could you point out in what ways specifically this
> Todd> has been useful?  Perhaps adding appropriate logging on the
> Todd> server would be worth it for some of the rest of us.
> We have a HUGE environment here, and almost (>95%) all of our
> production software is run from readonly AFS volumes.  When we want to
> decommission old releases of software, and reclaim the space, we have
> a huge headache on our hands.
> We need to know *who* is using something, so we can get them to
> upgrade to newer releases of the given product. [...]
> First of all, we perform server-side analysis of AFS volume access [...]
> But that only tells me *when* software was accessed, not by *who*.
> This is where the cache audits have proven emmensely useful.
> However, tactically, I am looking for a way to get better data out of
> the clients,[...]

You might be interested in what we've done in this area.  We're an 
academic shop (U. of North Carolina - Chapel Hill), so our needs are 
admittedly different from yours, but we build a bunch of packages from 
source, usually for as many of our supported architectures as we can get 
'em to build on.  We wanted to know who's using what, so we would know 
how to spend our limited people resources when deciding what to upgrade, 
what versions to abandon, etc.

We came up with a mechanism called runlogger.  Basically, we stick a 
call to the runlogger client function somewhere near the beginning of a 
program when we build it. If that's not practical (if it's a script 
based thing for example) we have it call the stand-alone runlogger 
client program, and if it comes to it and we really want it logged badly 
enough, we'll wrap the application in a script that runs the runlogger 
client before running the program in question.

The runlogger client takes one parameter -- the name of the package we 
want to log.  If we need finer grained logging (a pkg might contain 
several different programs for example), then it could pass the pkg 
name, a colon, and the program name as one parameter.  Runlogger takes 
this parameter and concatenates the uid of the user (which is usually 
who he/she's klogged as) and the AFS @sys name for this architecture 
(which was hard coded into the runlogger routine at build time) into a 
colon delimited string and passes it off via UDP to the runloggerd 
daemon indicated in the runlogger pkg's config file.

runloggerd takes this steady stream of UDP packets from all these 
different clients, adds to them a time stamp and the IP address of 
client, and appends them onto its log file.  You get things that look 
like this (w/ numbers changed to protect the innocent):

> 2002. []:[rs_aix43]:[5678]:pine-421
> 2002. []:[sun4x_57]:[0]:lynx-284
> 2002. []:[rs_aix43]:[5847]:pine-421
> 2002. []:[sun4x_58]:[26678]:pine-421
> 2002. []:[rs_aix43]:[9491]:pine-421
> 2002. []:[rs_aix43]:[3190]:openssh-252p2
> 2002. []:[sgi_65]:[6309]:tcsh-611

That's a time stamp, the client IP, the @sys name, uid, and pkg name.

We routinely analyze the log file to see what's being run, when, by 
whom, and on what architecture(s).  You can try to log everything, or 
limit it to only logging those things you're interested it at the moment.

We've made a variation of it called pmlogger which lets us see which 
Perl modules are actually being used.  (Perl module life cycling can be 
a real pain, and it's a lot easier to drop support for an old module 
when you know it isn't being used by anybody.)

I'm sure the file servers could give us other interesting information, 
but the runlogger/runloggerd approach has given us good results without 
having to change the production servers.  It adds a little overhead to 
each logged program's startup, but not much.  If you interested, I could 
package it up and make it presentable...
   / Todd_Lewis@unc.edu                  http://www.unc.edu/~utoddl /
  /(919) 962-5273  Linux - It's now safe to turn on your computer. /