[OpenAFS-devel] trying to track down a cm hang/lockup...
Neulinger, Nathan
nneul@umr.edu
Fri, 12 Jul 2002 15:18:43 -0500
Well, once of them just crashed again... Looks to me like whatever is
crashing is enough to completely lock the machine, not just AFS. There
was no oops. I've yet to be able to get a useful trace out of it...
Still looking over it though... Based on the symbol offsets, it looks to
me like it is somewhere in d_lookup.
Interesting, repeatedly hitting Alt-SysRQ-P has it bouncing around to
different addresses, but all within d_lookup. Could there be something
that cache manager corrupted that would be causing the kernel to spin in
d_lookup?
I swear, even if it forces me to look at assembly, kdb is going in my
next kernel build.=20
It's this section of the dissassembled d_lookup:
ad3: 8b 1c 24 mov (%esp,1),%ebx
ad6: 83 eb 10 sub $0x10,%ebx
ad9: 39 2c 24 cmp %ebp,(%esp,1)
adc: 0f 84 ae 00 00 00 je b90 <d_lookup+0x120>
ae2: 8b 04 24 mov (%esp,1),%eax
ae5: 8b 54 24 08 mov 0x8(%esp,1),%edx
ae9: 8b 00 mov (%eax),%eax
aeb: 89 04 24 mov %eax,(%esp,1)
aee: 39 53 44 cmp %edx,0x44(%ebx)
af1: 75 e0 jne ad3 <d_lookup+0x63>
-- Nathan
------------------------------------------------------------
Nathan Neulinger EMail: nneul@umr.edu
University of Missouri - Rolla Phone: (573) 341-4841
Computing Services Fax: (573) 341-4216
> -----Original Message-----
> From: Neulinger, Nathan=20
> Sent: Thursday, July 11, 2002 10:32 AM
> To: 'Derrick J Brashear'
> Subject: RE: [OpenAFS-devel] trying to track down a cm hang/lockup...
>=20
>=20
> Have not tried the head yet.
>=20
> If I don't get anything useful out of the next failure,=20
> trying head will likely be the next step.=20
>=20
> -- Nathan
>=20
> ------------------------------------------------------------
> Nathan Neulinger EMail: nneul@umr.edu
> University of Missouri - Rolla Phone: (573) 341-4841
> Computing Services Fax: (573) 341-4216
>=20
>=20
> > -----Original Message-----
> > From: Derrick J Brashear [mailto:shadow@dementia.org]=20
> > Sent: Thursday, July 11, 2002 10:28 AM
> > To: Neulinger, Nathan
> > Subject: RE: [OpenAFS-devel] trying to track down a cm=20
> hang/lockup...
> >=20
> >=20
> > On Thu, 11 Jul 2002, Neulinger, Nathan wrote:
> >=20
> > > > > At the moment, I've got the watchdog turned off on the=20
> > > > machines, and am
> > > > > waiting for the next failure to see what I can determine...
> > > >=20
> > > > ok. you're not running with the lock tracing patches to=20
> > > > fstrace, are you?
> > > > i never got those to work without problems
> > >=20
> > > Hmm... Would they be in the protos branch/head and enabled=20
> > by default?
> > > If so, yes. Otherwise no.=20
> >=20
> > If they are, they aren't enabled. Have you determined this is=20
> > in the head
> > and the protos branch?
> >=20
> >=20
> >=20
>=20