[OpenAFS] Re: Investigating 'calls waiting' from rxdebug
drosih@rpi.edu
drosih@rpi.edu
Fri, 16 Aug 2013 15:36:17 -0400
On Fri, 16 Aug 2013 12:15:25 EDT Andrew Deason wrote:
If those show very little, though, you probably have a thread actually
hanging on something else, so you won't see a lot of activity. (If
you show very little disk, net, and cpu usage at the same time, that
seems pretty likely.) In that case you'd need to look at a stack trace
for the fileserver process, or ideally capture a core to be examined
later.
Dan's message looks very useful, and also makes me feel good because
it implies that I was making some good guesses as I tried to pin down
this problem. I did try to turn up logging at one point, and here
are all the log entries which came up in FileLog:
Thu Aug 15 02:34:58 2013 Set Debug On level = 1
Thu Aug 15 02:35:08 2013 [0] Set Debug On level = 5
Thu Aug 15 02:35:18 2013 [0] Reset Debug levels to 0
That's it. 3 entries.
Now by the time I tried that it was very late (2am, obviously), so
it's vaguely *possible* that all the workstations which were doing
I/O were already in a call-waiting state. But my guess is that we
had some thread which really was hanging on something else.
I did also try doing some tcpdumps and summarizing that traffic,
but nothing remarkable showed up. However earlier today I learned
that the way I did that might have generated misleading results
(for reasons I won't bore you with right now). But based on those
tcpdumps I doubt we were getting hammered with AFS traffic,
especially not for such a long stretch of time in the middle of
the summer.
I'll also say that at one point I thought the problem might have
been that we had too many AFS volumes on one of the partitions
on the "calls-waiting" server, so I started doing 'vos move's to
move AFS volumes to a different server. None of those vos moves
ran into any lags at all, even while the calls-waiting counter
was very high.
Thanks for the answers. These will be helpful if the problem
shows up again, and I suspect it will. And that will probably be
on the next time I try to take a vacation day!
[okay, let's see how badly webmail mangles THIS message...]
--
Garance Alistair Drosehn = drosih@rpi.edu
Senior Systems Programmer or gad@FreeBSD.org
Rensselaer Polytechnic Institute; Troy, NY; USA