[OpenAFS-devel] trying to track down a cm hang/lockup...
Neulinger, Nathan
nneul@umr.edu
Fri, 12 Jul 2002 15:36:25 -0500
Tracing it out by hand with symbol list gets me:
__read_lock_failed
cdput
__user_walk
getname
dput
vcache2inode (libafs)
sock_recvmsg
follow_down
path_release
d_lookup
I don't have much more info though unfortunately.
(If one of the core developers is handy with kdb and would be willing to
look around at some point - I've got these machines on serial
consoles... Just got to rebuild kernel with kdb support first. Don't
know how much of an impact that has though. We've got three in a
checked rotation, so I can leave one in the hung state for a while if
need be.)
Other two are still running, will perform same checks on them to see if
it traces to the same problem.=20
-- Nathan
------------------------------------------------------------
Nathan Neulinger EMail: nneul@umr.edu
University of Missouri - Rolla Phone: (573) 341-4841
Computing Services Fax: (573) 341-4216
> -----Original Message-----
> From: Neulinger, Nathan=20
> Sent: Friday, July 12, 2002 3:19 PM
> To: OpenAFS-Devel Mailing List (E-mail)
> Subject: RE: [OpenAFS-devel] trying to track down a cm hang/lockup...
>=20
>=20
> Well, once of them just crashed again... Looks to me like whatever is
> crashing is enough to completely lock the machine, not just AFS. There
> was no oops. I've yet to be able to get a useful trace out of it...
> Still looking over it though... Based on the symbol offsets,=20
> it looks to
> me like it is somewhere in d_lookup.
>=20
> Interesting, repeatedly hitting Alt-SysRQ-P has it bouncing around to
> different addresses, but all within d_lookup. Could there be something
> that cache manager corrupted that would be causing the kernel=20
> to spin in
> d_lookup?
>=20
> I swear, even if it forces me to look at assembly, kdb is going in my
> next kernel build.=20
>=20
> It's this section of the dissassembled d_lookup:
>=20
> ad3: 8b 1c 24 mov (%esp,1),%ebx
> ad6: 83 eb 10 sub $0x10,%ebx
> ad9: 39 2c 24 cmp %ebp,(%esp,1)
> adc: 0f 84 ae 00 00 00 je b90 <d_lookup+0x120>
> ae2: 8b 04 24 mov (%esp,1),%eax
> ae5: 8b 54 24 08 mov 0x8(%esp,1),%edx
> ae9: 8b 00 mov (%eax),%eax
> aeb: 89 04 24 mov %eax,(%esp,1)
> aee: 39 53 44 cmp %edx,0x44(%ebx)
> af1: 75 e0 jne ad3 <d_lookup+0x63>
>=20
> -- Nathan
>=20
> ------------------------------------------------------------
> Nathan Neulinger EMail: nneul@umr.edu
> University of Missouri - Rolla Phone: (573) 341-4841
> Computing Services Fax: (573) 341-4216
>=20
>=20
> > -----Original Message-----
> > From: Neulinger, Nathan=20
> > Sent: Thursday, July 11, 2002 10:32 AM
> > To: 'Derrick J Brashear'
> > Subject: RE: [OpenAFS-devel] trying to track down a cm=20
> hang/lockup...
> >=20
> >=20
> > Have not tried the head yet.
> >=20
> > If I don't get anything useful out of the next failure,=20
> > trying head will likely be the next step.=20
> >=20
> > -- Nathan
> >=20
> > ------------------------------------------------------------
> > Nathan Neulinger EMail: nneul@umr.edu
> > University of Missouri - Rolla Phone: (573) 341-4841
> > Computing Services Fax: (573) 341-4216
> >=20
> >=20
> > > -----Original Message-----
> > > From: Derrick J Brashear [mailto:shadow@dementia.org]=20
> > > Sent: Thursday, July 11, 2002 10:28 AM
> > > To: Neulinger, Nathan
> > > Subject: RE: [OpenAFS-devel] trying to track down a cm=20
> > hang/lockup...
> > >=20
> > >=20
> > > On Thu, 11 Jul 2002, Neulinger, Nathan wrote:
> > >=20
> > > > > > At the moment, I've got the watchdog turned off on the=20
> > > > > machines, and am
> > > > > > waiting for the next failure to see what I can determine...
> > > > >=20
> > > > > ok. you're not running with the lock tracing patches to=20
> > > > > fstrace, are you?
> > > > > i never got those to work without problems
> > > >=20
> > > > Hmm... Would they be in the protos branch/head and enabled=20
> > > by default?
> > > > If so, yes. Otherwise no.=20
> > >=20
> > > If they are, they aren't enabled. Have you determined this is=20
> > > in the head
> > > and the protos branch?
> > >=20
> > >=20
> > >=20
> >=20
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>=20