[OpenAFS] Re: interpreting cmdebug output of locked entries

Jonathan Nilsson jnilsson@uci.edu
Wed, 2 Nov 2011 17:45:12 -0700


--00151758b1ccf11e6f04b0c9e77d
Content-Type: text/plain; charset=ISO-8859-1

Thanks for your reply. This is my first real foray into debugging a crash
that may be AFS related, so I'm taking my time reading a lot of docs, and
hence my delayed reply.


> pid 3700 is actually probably just an afsd daemon. How long does it stay
> like this?


The system was wedged for a little over 12 hours until I noticed and
rebooted it. But you are probably asking how long this particular cache
entry was listed in the cmdebug output. I only ran cmdebug once, so I can't
say...


> at:617 looks like it's just part of the process when a
> background daemon is writing out dirty cache entries to disk. It should
> not take very long, and we only do that about once an hour.
>

Knowing that this could just be regular background maintenance is helpful.
I'll try running cmdebug multiple times to see if the output changes.

If you could alt-sysrq-t on the console, you may get a listing of
> process kernel backtraces logged, which would be helpful.


I will try this too next time. Hopefully that gives some more clues.


> > Now, trying to determine the file that this cache entry refers to,
>
> It's fid 536870959.1.1, which is the root directory for volume
> 536870959. However, nobody is waiting for the lock on that cache entry,
> so it's not causing the hang. (It may be what is hang_ing_, however, and
> I would assume pid 22600 may be an httpd process)


Hmm, okay, so I am to understand that the output from cmdebug here isn't
necessarily indicative of a problem. As in... something else might be
causing AFS to hang. Or if the output of cmdebug does change, then the
dirty cache entry is being successfully written to disk, and it might not
be AFS-related at all.

Thanks again for helping me understand a little bit more!
--
Jonathan

--00151758b1ccf11e6f04b0c9e77d
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div class=3D"gmail_quote"><div>Thanks for your reply. This is my first rea=
l foray into debugging a crash that may be AFS related, so I&#39;m taking m=
y time reading a lot of docs, and hence my delayed reply.</div><div>=A0</di=
v>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">
pid 3700 is actually probably just an afsd daemon. How long does it stay<br=
>
like this?</blockquote><div><br></div><div>The system was wedged for a litt=
le over 12 hours until I noticed and rebooted it. But you are probably aski=
ng how long this particular cache entry was listed in the cmdebug output. I=
 only ran cmdebug once, so I can&#39;t say...</div>


<div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex"> at:617 looks like it&#39;s ju=
st part of the process when a<br>
background daemon is writing out dirty cache entries to disk. It should<br>
not take very long, and we only do that about once an hour.<br></blockquote=
><div><br></div><div>Knowing that this could just be regular background=A0m=
aintenance=A0is helpful. I&#39;ll try running cmdebug multiple times to see=
 if the output changes.</div>


<div><br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex=
;border-left:1px #ccc solid;padding-left:1ex">If you could alt-sysrq-t on t=
he console, you may get a listing of<br>
process kernel backtraces logged, which would be helpful.</blockquote><div>=
=A0</div><div>I will try this too next time. Hopefully that gives some more=
 clues.</div><div>=A0</div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">



<div>&gt; Now, trying to determine the file that this cache entry refers to=
,<br>
<br>
</div>It&#39;s fid 536870959.1.1, which is the root directory for volume<br=
>
536870959. However, nobody is waiting for the lock on that cache entry,<br>
so it&#39;s not causing the hang. (It may be what is hang_ing_, however, an=
d<br>
I would assume pid 22600 may be an httpd process)</blockquote><div><br></di=
v><div>Hmm, okay, so I am to understand that the output from cmdebug here i=
sn&#39;t necessarily indicative of a problem. As in... something else might=
 be causing AFS to hang. Or if the output of cmdebug does change, then the =
dirty cache entry is being successfully written to disk, and it might not b=
e AFS-related at all.</div>

<div><br></div><div>Thanks again for helping me understand a little bit mor=
e!</div><div>--</div><div>Jonathan</div><div><br></div></div>

--00151758b1ccf11e6f04b0c9e77d--