[OpenAFS] Re: Strange caching failures

Andrew Deason adeason@sinenomine.net
Tue, 5 Mar 2013 14:01:17 -0600


On Fri, 1 Mar 2013 10:19:03 +0100 (CET)
Stephan Wonczak <a0033@rrz.uni-koeln.de> wrote:

> Very difficult to say since the behavior is so non-deterministic.
> What my colleague did was to write a cronjob on one machine (client
> version 1.4.11) to write to a short status file every five minutes,
> and subsequently do a 'ls -l' on several other clients (both 1.4 and
> 1.6).  So far it *looks* like it is only the 1.6.x-clients that stop
> updating, but for this specific file it takes between 4 and 12 hours
> for the effect to show up.

If you're still looking at this, one thing you can do for further
investigation is get some debugging info from the client. On the clients
that you think might miss an update, run:

fstrace clear cm
fstrace setlog cmfx -buffers 1024
fstrace sets cm -active

And then as soon as you notice they've missed an update, run:

fstrace dump cm > /some/log

To capture the last bit of the debug log.

Alternatively, you can capture the debug information continuously as
your "test" is running by running:

fstrace dump -follow cmfx -sleep 1 > /some/log &

But that may generate a lot of output, if the client is doing a lot of
stuff. 

To stop tracing, kill the 'fstrace dump' process if it's still running,
and run 'fstrace sets cm -inactive'. You'll need to provide the debug
information to a developer, and that debug log will have information
about all client activity on that machine. So of course, if you don't
want to do that, this isn't really an option.

-- 
Andrew Deason
adeason@sinenomine.net