[OpenAFS-devel] possible solution for iput issue... (also transarc #60276)

Neulinger, Nathan nneul@umr.edu
Mon, 16 Apr 2001 12:02:56 -0500


By adding a ton more debug output - I was able to track this back to
check_bad_parent. It calls an afs_lookup, which eventually gets around to a
GetVCache.

I was able to make my problem go away by adding 

    VN_RELE(avc);

at the end of check_bad_parent in LINUX/osi_misc.c inside the bad-parent-if,
after the !avc||avc!=vcp if. 

However, I don't know if that is really the appropriate way to correct the
problem, or if that has all the correct locks, etc. Also, should it be
VN_RELE(avc) or VN_RELE(vcp)? I think avc is correct, since afs-lookup is
setting *avcp = tvc after the GetVCache calls.

Anyway, I hope this is enough information for one of y'all to come up with a
real fix.

I just did a cmdebug check on a bunch of the machines I've got, and several
of them have large refcounts on a number of volume roots, but only a few of
them are really high. Alot are in the 200-800 range. I'd be willing to bet
those would all be caused by this same leak.

I understand the basic purpose of check_bad_parent, but could someone give
me a idea of a sequence of steps that I can use to reproduce the symptom
that would NEED check_bad_parent, so this stuff can be tested on something
other than my main server, and so transarc can reproduce it as well?

-- Nathan

> -----Original Message-----
> From: Neulinger, Nathan [mailto:nneul@umr.edu]
> Sent: Monday, April 16, 2001 9:49 AM
> To: 'openafs-devel@openafs.org'
> Subject: [OpenAFS-devel] kernel debug info for iput issue...
> 
> 
> I added some debug tracing to afs/LINUX/osi_misc.c and 
> afs/LINUX/osi_vfs.h.
> 
> I've determined that it occurs when loading the root page for 
> that server.
> It increases by approximately 9-12 each time that page is 
> loaded. Simply
> killing and restarting the web server increases it by 1 or 2. 
> 
> i_count for 0xd0340a28/1831075970 at ../afs/afs_vcache.c:2374 is 117
> i_count++ for 0xd0340a28/1831075970 at 
> ../afs/afs_vcache.c:2374 now 118
> 
> that's osi_vnhold in FindVCache
> 
> i_count for 0xd0340a28/1831010434 at ../afs/afs_osidnlc.c:246 is 118
> i_count++ for 0xd0340a28/1831010434 at 
> ../afs/afs_osidnlc.c:246 now 119
> tvc->mvstat = 2 in afs_vnop_lookup
> osi_iput: i_count for 0xd0340a28/1831010434 at 
> ../afs/osi_misc.c:352, is 119
> i_count for 0xd0340a28/1831010434 at ../afs/osi_misc.c:377, is 119
> i_count-- for 0xd0340a28/1831010434 at ../afs/osi_misc.c:380, now 118
> 
> That's osi_iput called from osi_dnlc_lookup called from 
> afs_vnop_lookup
> 
> i_count for 0xd0340a28/1831010434 at ../afs/afs_vcache.c:2374 is 118
> i_count++ for 0xd0340a28/1831010434 at 
> ../afs/afs_vcache.c:2374 now 119
> i_count for 0xd0340a28/1831075970 at ../afs/afs_osidnlc.c:246 is 119
> i_count++ for 0xd0340a28/1831075970 at 
> ../afs/afs_osidnlc.c:246 now 120
> 
> 
> It sure looks to me like the reference obtained by FindVCache 
> is being lost.
> Both FindVCache and osi_dncl_lookup  are incrementing the 
> reference count,
> yet osi_iput is only being called after the afs_vnop_lookup and never
> cancelling out the reference from FindVCache.
> 
> For other inodes (stripped from above output) it looks fine. 
> I typically see
> a FindVCache for those, shortly followed by an osi_iput that 
> decrements the
> refcount. 
> 
> Look at this from VNOPS/afs_vnop_lookup.c
> 
>     *avcp = tvc;  /* maybe wasn't initialized, but it is now */
> #ifdef AFS_LINUX22_ENV
>     if (tvc) {
>       if (tvc->mvstat == 2) { /* we don't trust the dnlc for 
> root vcaches */
>         AFS_RELE(tvc);
>         *avcp = 0;
>       }
>       else {
>         code = 0;
>         hit = 1;
>         goto done;
>       }
>     }
> #else /* non - LINUX */
> 
> That release is definately taking place... 
> 
> -- Nathan
> 
> ------------------------------------------------------------
> Nathan Neulinger                       EMail:  nneul@umr.edu
> University of Missouri - Rolla         Phone: (573) 341-4841
> Computing Services                       Fax: (573) 341-4216
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo.cgi/openafs-devel
>