[OpenAFS-devel] problems with current cvs on linux - oops's, possibly related to 1.32-1.34 changes in osi_vnodeops.c
Derek Atkins
warlord@MIT.EDU
14 Mar 2002 14:19:56 -0500
In fact, that would exactly explain the single-step that just
happened! I set two breakpoints, one in
afs_linux_dentry_revaliadate() and one at line 821, just after the
call to Check_AtSys. Your assessment would explain why it jumped
from the head of the function to the free():
Breakpoint 1, afs_linux_dentry_revalidate (dp=0xc20cc1e0, flags=0)
at ../afs/osi_vnodeops.c:800
800 cred_t *credp = crref();
(gdb)
Continuing.
Breakpoint 2, osi_FreeLargeSpace (adata=0xc1230000)
at ../afs/afs_osi_alloc.c:71
71 AFS_STATCNT(osi_FreeLargeSpace);
(gdb)
This patch (suggested by you) does fix the problem.
-derek
Index: src/afs/LINUX/osi_vnodeops.c
===================================================================
RCS file: /cvs/openafs/src/afs/LINUX/osi_vnodeops.c,v
retrieving revision 1.34
diff -u -r1.34 osi_vnodeops.c
--- src/afs/LINUX/osi_vnodeops.c 2002/03/10 19:23:38 1.34
+++ src/afs/LINUX/osi_vnodeops.c 2002/03/14 19:15:53
@@ -807,6 +807,8 @@
AFS_GLOCK();
+ sysState.allocked = 0;
+
/* If it's a negative dentry, then there's nothing to do. */
if (!vcp || !parentvcp)
goto done;
"Neulinger, Nathan" <nneul@umr.edu> writes:
> Well, adding a
>
> sysState.allocked = 0;
>
> at the top of the revalidate routine makes the oopses stop for me.
>
> -- Nathan
>
> ------------------------------------------------------------
> Nathan Neulinger EMail: nneul@umr.edu
> University of Missouri - Rolla Phone: (573) 341-4841
> Computing Services Fax: (573) 341-4216
>
>
> > -----Original Message-----
> > From: Neulinger, Nathan
> > Sent: Thursday, March 14, 2002 12:59 PM
> > To: openafs-devel@openafs.org
> > Subject: RE: [OpenAFS-devel] problems with current cvs on
> > linux - oops's, possibly related to 1.32-1.34 changes in
> > osi_vnodeops.c
> >
> >
> > Are local vars guaranteed to be initialized to null?
> >
> > That revalidate routine has:
> >
> > sysname_info sysState;
> >
> > if (...)
> > goto done;
> >
> > ... set sysState in Check_AtSys
> >
> > done:
> > free if sysstate.allocked is 1.
> >
> > -- Nathan
> >
> > ------------------------------------------------------------
> > Nathan Neulinger EMail: nneul@umr.edu
> > University of Missouri - Rolla Phone: (573) 341-4841
> > Computing Services Fax: (573) 341-4216
> >
> >
> > > -----Original Message-----
> > > From: Neulinger, Nathan
> > > Sent: Thursday, March 14, 2002 12:53 PM
> > > To: 'Derek Atkins'
> > > Cc: openafs-devel@openafs.org; chas@cmf.nrl.navy.mil; Ted Anderson
> > > Subject: RE: [OpenAFS-devel] problems with current cvs on
> > > linux - oops's, possibly related to 1.32-1.34 changes in
> > > osi_vnodeops.c
> > >
> > >
> > > I wonder if the allocked entry of the sysname state isn't
> > > getting set back to zero somewhere.
> > >
> > > -- Nathan
> > >
> > > ------------------------------------------------------------
> > > Nathan Neulinger EMail: nneul@umr.edu
> > > University of Missouri - Rolla Phone: (573) 341-4841
> > > Computing Services Fax: (573) 341-4216
> > >
> > >
> > > > -----Original Message-----
> > > > From: Derek Atkins [mailto:warlord@MIT.EDU]
> > > > Sent: Thursday, March 14, 2002 12:44 PM
> > > > To: Neulinger, Nathan
> > > > Cc: openafs-devel@openafs.org; chas@cmf.nrl.navy.mil; Ted Anderson
> > > > Subject: Re: [OpenAFS-devel] problems with current cvs on
> > > > linux - oops's, possibly related to 1.32-1.34 changes in
> > > > osi_vnodeops.c
> > > >
> > > >
> > > > In a closer look there is DEFINITELY a double-free going
> > on. I just
> > > > caught the following while breakpoints are set in both
> > > > osi_AllocLargeSpace() and osi_FreeLargeSpace():
> > > >
> > > > Breakpoint 2, osi_FreeLargeSpace (adata=0xc77a9000)
> > > > at ../afs/afs_osi_alloc.c:71
> > > > 71 AFS_STATCNT(osi_FreeLargeSpace);
> > > > (gdb)
> > > > Continuing.
> > > >
> > > > Breakpoint 2, osi_FreeLargeSpace (adata=0xc4170000)
> > > > at ../afs/afs_osi_alloc.c:71
> > > > 71 AFS_STATCNT(osi_FreeLargeSpace);
> > > > (gdb)
> > > > Continuing.
> > > >
> > > > Breakpoint 2, osi_FreeLargeSpace (adata=0xc4170000)
> > > > at ../afs/afs_osi_alloc.c:71
> > > > 71 AFS_STATCNT(osi_FreeLargeSpace);
> > > >
> > > >
> > > > Notice that it's trying to free 0xc4170000 twice? This
> > last free()
> > > > is coming from:
> > > >
> > > > (gdb) where
> > > > #0 osi_FreeLargeSpace (adata=0xc4170000) at
> > > ../afs/afs_osi_alloc.c:71
> > > > #1 0xc8896a5d in afs_linux_dentry_revalidate
> > > (dp=0xc3823260, flags=0)
> > > > at ../afs/osi_vnodeops.c:847
> > > > #2 0xc01420fd in cached_lookup (parent=0xc662d7c0,
> > > > name=0xc5349f98, flags=0)
> > > > at namei.c:249
> > > >
> > > > Unfortunately I didn't get a backtrace on the pentultimate
> > > > free(). I'll
> > > > keep working on it. But this is definitely the cause of the
> > > > problem --
> > > > the same packet is getting onto the freelist twice.
> > > >
> > > > -derek
> > > >
> > > > "Neulinger, Nathan" <nneul@umr.edu> writes:
> > > >
> > > > > Derek was able to trace down the reason for the fault
> > > witn kgdb. I'd
> > > > > guess it likely has something to do with these recent changes to
> > > > > osi_vnodeops.c. I'll take a closer look, but I'm not
> > > really familiar
> > > > > with what's going on in the code here.
> > > > >
> > > > > -- Nathan
> > > > >
> > > > > ------------------------------------------------------------
> > > > > Nathan Neulinger EMail: nneul@umr.edu
> > > > > University of Missouri - Rolla Phone: (573) 341-4841
> > > > > Computing Services Fax: (573) 341-4216
> > > > >
> > > > >
> > > > > -----Original Message-----
> > > > > From: Neulinger, Nathan
> > > > > Sent: Thursday, March 14, 2002 12:29 PM
> > > > > To: 'Derek Atkins'
> > > > > Subject: RE: Have you had a succesful build+use with
> > > > current openafs?
> > > > >
> > > > >
> > > > > FYI Looks like it was probably introduced in 1.33 of
> > > osi_vnodeops.c.
> > > > > There were a bunch of changes for revalidate_dnode.
> > > > >
> > > > > -- Nathan
> > > > >
> > > > > ------------------------------------------------------------
> > > > > Nathan Neulinger EMail: nneul@umr.edu
> > > > > University of Missouri - Rolla Phone: (573) 341-4841
> > > > > Computing Services Fax: (573) 341-4216
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Derek Atkins [mailto:warlord@MIT.EDU]
> > > > > > Sent: Thursday, March 14, 2002 12:23 PM
> > > > > > To: Neulinger, Nathan
> > > > > > Subject: Re: Have you had a succesful build+use with
> > > > current openafs?
> > > > > >
> > > > > >
> > > > > > Yep. kgdb.sf.net. Requires two systems with a "serial" line
> > > > > > between them. In my case I'm using vmware :)
> > > > > >
> > > > > > We should definitely report this to openafs-devel!
> > I'll look a
> > > > > > bit more but I've only got about 30 minutes more I
> > can sink into
> > > > > > this today.
> > > > > >
> > > > > > -derek
> > > > > >
> > > > > > "Neulinger, Nathan" <nneul@umr.edu> writes:
> > > > > >
> > > > > > > Goody! :)
> > > > > > >
> > > > > > > Thanks. That's reassuring, nothing like having a bug
> > > > like this screw
> > > > > > > with your head when you think that you've made a whole
> > > > > > bunch of changes
> > > > > > > that should not have any affect on code behavior.
> > > > > > >
> > > > > > > Are you using the kernel debugger patches? I definately
> > > > > > should dig into
> > > > > > > that some time.
> > > > > > >
> > > > > > > -- Nathan
> > > > > > >
> > > > > > > ------------------------------------------------------------
> > > > > > > Nathan Neulinger EMail: nneul@umr.edu
> > > > > > > University of Missouri - Rolla Phone: (573) 341-4841
> > > > > > > Computing Services Fax: (573) 341-4216
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Derek Atkins [mailto:warlord@MIT.EDU]
> > > > > > > > Sent: Thursday, March 14, 2002 12:17 PM
> > > > > > > > To: Neulinger, Nathan
> > > > > > > > Subject: Re: Have you had a succesful build+use with
> > > > > > current openafs?
> > > > > > > >
> > > > > > > >
> > > > > > > > Here is the stack trace of what's going on. I have
> > > > no idea why
> > > > > > > > freePacketList is being set to '1'. I'll keep
> > > > looking, but this
> > > > > > > > is clearly not "just you" :)
> > > > > > > >
> > > > > > > > -derek
> > > > > > > >
> > > > > > > > Program received signal SIGSEGV, Segmentation fault.
> > > > > > > > 0xc8863031 in osi_AllocLargeSpace (size=720) at
> > > > > > > > ../afs/afs_osi_alloc.c:201
> > > > > > > > 201 if ( tp ) freePacketList = tp->next;
> > > > > > > > (gdb) where
> > > > > > > > #0 0xc8863031 in osi_AllocLargeSpace (size=720) at
> > > > > > > > ../afs/afs_osi_alloc.c:201
> > > > > > > > #1 0xc8872a91 in afs_DoBulkStat (adp=0xc8936658,
> > > > dirCookie=480,
> > > > > > > > areqp=0xc5347e68) at ../afs/afs_vnop_lookup.c:426
> > > > > > > > #2 0xc8874c25 in afs_lookup (adp=0xc8936658,
> > > > > > aname=0xc13ded80 "CVS",
> > > > > > > > avcp=0xc5347ec4, acred=0xc412a000) at
> > > > > > > > ../afs/afs_vnop_lookup.c:1190
> > > > > > > > #3 0xc8896c14 in afs_linux_lookup (dip=0xc8936658,
> > > > dp=0xc13ded20)
> > > > > > > > at ../afs/osi_vnodeops.c:993
> > > > > > > > #4 0xc0142175 in real_lookup (parent=0xc79210c0,
> > > > > > > > name=0xc5347f4c, flags=0)
> > > > > > > > at namei.c:284
> > > > > > > > #5 0xc0142936 in path_walk (name=0xc76bf011 "",
> > > > > > > > nd=0xc5347f98) at namei.c:564
> > > > > > > > #6 0xc014313a in __user_walk (name=0xbfffb820
> > > > > > > > "../../src/doc/CVS", flags=9,
> > > > > > > > nd=0xc5347f98) at namei.c:805
> > > > > > > > #7 0xc013fcf6 in sys_stat64 (filename=0xbfffb820
> > > > > > > > "../../src/doc/CVS",
> > > > > > > > statbuf=0xbfff96b0, flags=1075171668) at stat.c:337
> > > > > > > > #8 0xc0106fcb in system_call () at af_packet.c:1879
> > > > > > > > (gdb) p tp
> > > > > > > > $1 = (struct osi_packet *) 0x1
> > > > > > > > (gdb) p freePacketList
> > > > > > > > $2 = (struct osi_packet *) 0x1
> > > > > > > >
> > > > > > > > --
> > > > > > > > Derek Atkins, SB '93 MIT EE, SM '95 MIT Media
> > > > Laboratory
> > > > > > > > Member, MIT Student Information Processing
> > > > Board (SIPB)
> > > > > > > > URL: http://web.mit.edu/warlord/ PP-ASEL-IA
> > > > N1NWH
> > > > > > > > warlord@MIT.EDU PGP key
> > > > available
> > > > > > > >
> > > > > >
> > > > > > --
> > > > > > Derek Atkins, SB '93 MIT EE, SM '95 MIT Media
> > Laboratory
> > > > > > Member, MIT Student Information Processing
> > Board (SIPB)
> > > > > > URL: http://web.mit.edu/warlord/ PP-ASEL-IA
> > N1NWH
> > > > > > warlord@MIT.EDU PGP key
> > available
> > > > > >
> > > > > _______________________________________________
> > > > > OpenAFS-devel mailing list
> > > > > OpenAFS-devel@openafs.org
> > > > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > > >
> > > > --
> > > > Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
> > > > Member, MIT Student Information Processing Board (SIPB)
> > > > URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
> > > > warlord@MIT.EDU PGP key available
> > > >
> > >
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> >
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
--
Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
Member, MIT Student Information Processing Board (SIPB)
URL: http://web.mit.edu/warlord/ PP-ASEL-IA N1NWH
warlord@MIT.EDU PGP key available