[OpenAFS-devel] trying to track down a cm hang/lockup...

Neulinger, Nathan nneul@umr.edu
Fri, 12 Jul 2002 15:18:43 -0500


Well, once of them just crashed again... Looks to me like whatever is
crashing is enough to completely lock the machine, not just AFS. There
was no oops. I've yet to be able to get a useful trace out of it...
Still looking over it though... Based on the symbol offsets, it looks to
me like it is somewhere in d_lookup.

Interesting, repeatedly hitting Alt-SysRQ-P has it bouncing around to
different addresses, but all within d_lookup. Could there be something
that cache manager corrupted that would be causing the kernel to spin in
d_lookup?

I swear, even if it forces me to look at assembly, kdb is going in my
next kernel build.=20

It's this section of the dissassembled d_lookup:

     ad3:       8b 1c 24                mov    (%esp,1),%ebx
     ad6:       83 eb 10                sub    $0x10,%ebx
     ad9:       39 2c 24                cmp    %ebp,(%esp,1)
     adc:       0f 84 ae 00 00 00       je     b90 <d_lookup+0x120>
     ae2:       8b 04 24                mov    (%esp,1),%eax
     ae5:       8b 54 24 08             mov    0x8(%esp,1),%edx
     ae9:       8b 00                   mov    (%eax),%eax
     aeb:       89 04 24                mov    %eax,(%esp,1)
     aee:       39 53 44                cmp    %edx,0x44(%ebx)
     af1:       75 e0                   jne    ad3 <d_lookup+0x63>

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216


> -----Original Message-----
> From: Neulinger, Nathan=20
> Sent: Thursday, July 11, 2002 10:32 AM
> To: 'Derrick J Brashear'
> Subject: RE: [OpenAFS-devel] trying to track down a cm hang/lockup...
>=20
>=20
> Have not tried the head yet.
>=20
> If I don't get anything useful out of the next failure,=20
> trying head will likely be the next step.=20
>=20
> -- Nathan
>=20
> ------------------------------------------------------------
> Nathan Neulinger                       EMail:  nneul@umr.edu
> University of Missouri - Rolla         Phone: (573) 341-4841
> Computing Services                       Fax: (573) 341-4216
>=20
>=20
> > -----Original Message-----
> > From: Derrick J Brashear [mailto:shadow@dementia.org]=20
> > Sent: Thursday, July 11, 2002 10:28 AM
> > To: Neulinger, Nathan
> > Subject: RE: [OpenAFS-devel] trying to track down a cm=20
> hang/lockup...
> >=20
> >=20
> > On Thu, 11 Jul 2002, Neulinger, Nathan wrote:
> >=20
> > > > > At the moment, I've got the watchdog turned off on the=20
> > > > machines, and am
> > > > > waiting for the next failure to see what I can determine...
> > > >=20
> > > > ok. you're not running with the lock tracing patches to=20
> > > > fstrace, are you?
> > > > i never got those to work without problems
> > >=20
> > > Hmm... Would they be in the protos branch/head and enabled=20
> > by default?
> > > If so, yes. Otherwise no.=20
> >=20
> > If they are, they aren't enabled. Have you determined this is=20
> > in the head
> > and the protos branch?
> >=20
> >=20
> >=20
>=20