[OpenAFS-devel] fileserver profiling
Tom Keiser
Tom Keiser <tkeiser@gmail.com>
Wed, 2 Mar 2005 12:25:22 -0500
On Wed, 02 Mar 2005 11:11:36 -0500, Jeffrey Hutzelman <jhutz@cmu.edu> wrote:
>
>
> On Wednesday, March 02, 2005 10:55:09 AM -0500 Tom Keiser
> <tkeiser@gmail.com> wrote:
>
> > I've been profiling the pthreaded fileserver for a while now, and one
> > thing continues to bother me. We call the time() syscall so
> > frequently that it's often 20% of our total execution time.
> > There are plenty of hacks to reduce this overhead, but every one I
> > know of will introduce some amount of non-determinism. Is that an
> > acceptable risk?
> >
> > Having a thread in a nanosleep; gettimeofday loop might be acceptable,
> > but during periods of high load, you couldn't guarantee that your
> > global epoch time variable would be _monotonically_ increasing. I
> > suppose a combination of a SIGALRM handler and gettimeofday; setitimer
> > might mitigate that problem to some extent. Anyone have any better
> > suggestions on how to reduce this overhead?
> >
> > Certainly there are parts of the code that should continue to call
> > time() directly, but how much do we care if we accept a kerberos
> > ticket that expired a few milliseconds ago, when acceptable clock
> > drift is many orders of magnitude larger?
>
> I'm not sure where are all the places we are currently actually calling
> time(). I can think of four likely situations in which we care about time:
>
> (1) validating authentication
> (2) ubik voting
> (3) file timestamps
> (4) various timing-sensitive bits in rx
>
> For (1) and (2) we don't care about sub-second accuracy. This is pretty
> much true of any place we currently call time(), which returns an integer
> number of seconds.
>
> For (2), we _do_ care about being accurate within a few seconds. There is
> a certain small amount of skew allowed in the voting protocol, but it is
> intended to deal mostly with potential network latency, and throwing it
> away on clock skew can hurt.
>
Yeah, I figured ubik would not be a good candidate for this kind of
optimization.
> For (2) and (3) we _do_ care that the clock we actually use is
> monotonically increasing. As it stands right now, some fairly unhappy
> things can happen if the system clock is stepped backwards for some reason.
>
When you say stepped backwards, do you mean with respect to previous
values of a userspace "clock", or with respect to the real system
clock? I can certainly guarantee that a userspace clock is
non-decreasing, but I cannot guarantee a non-negative difference with
respect to the system clock at any given time.
> As for (4), there are any number of things in rx that are timing sensitive,
> but for the most part they don't work by calling time().
>
According to the DTrace pid provider, these are the top ten callers of
time() over a few second period on a 1.3.78 fileserver:
11 fileserver`rxi_ReadProc+0x2b0
13 fileserver`TryAttach+0x80
15 fileserver`rxi_SendXmitList+0x2a4
16 fileserver`common_StoreData64+0x83c
21 fileserver`VGetVnode_r+0x130
29 fileserver`VGetVnode_r+0xad8
151 fileserver`rx_ReadvProc+0x40
5966 fileserver`rxi_ListenerProc+0x118
7969 fileserver`rxi_SendAck+0x5c0
23760 fileserver`rxi_FillReadVec+0x10c
For rxi_FillReadVec at least, it's not rx, but the rxkad security
object's op_CheckPacket member, which DTrace misses since its address
is loaded from a function pointer. I have full stack traces for
~38000 calls to time(). If anyone wants to see more than just the
immediate caller, I can post them somewhere.
--
Tom Keiser
tkeiser@gmail.com