[OpenAFS-devel] vmalloc memory leak in 1.3.84/85?

Peter Somogyi psomogyi@gamax.hu
Tue, 25 Oct 2005 18:10:51 +0200


Hi Jeff,

Thank you for the detailed explanation, now I understand the basic intention.
We do use unlog, and _after_ unlog (waiting more than 1 hours, GCPAG=2) VmallocUsed (in /proc/meminfo) doesn't decrease.

I've looked into the libafs source, and found the root cause of our problem:
afs_pioctl.c, DECL_PIOCTL(PUnlog):
...
#ifdef UKERNEL
	    /* set the expire times to 0, causes
	     * afs_GCUserData to remove this entry
	     */
	    tu->ct.EndTimestamp = 0;
	    tu->tokenTime = 0;
#endif /* UKERNEL */
...

I've found that our libafs was compiled from the rpm "km_afs-1.4-rc4".
Somehow, in this compilation (make -f Makefile.module in /usr/src/kernel-modules/openafs) UKERNEL is not defined(!!!).
(I've put a wrong row "xxx" right after #ifdef UKERNEL, and it compiled fine... Putting another row "yyy" after #endif gives error, so it means UKERNEL is not defined...)

So my question:
- is our "km_afs" rpm wrong (km_afs-1.4-rc4 - generated based upon our special openafs.spec) that it doesn't define UKERNEL, or our openafs.spec is wrong that it doesn't define UKERNEL?
- why the above code part depends on UKERNEL flag?
- what does UKERNEL means really? What effect can I have if I turn it on somehow?

Note: the function afs_GCPAGs does the same which the above code in UKERNEL check, but afs_GCPAGs doesn't check UKERNEL.

On Monday 24 October 2005 18.03, Jeffrey Hutzelman wrote:
> On Monday, October 24, 2005 05:05:08 PM +0200 Peter Somogyi
>
> <psomogyi@gamax.hu> wrote:
> > We've encountered into the same bug (vmalloc leak in afs client when klog
> > many users, openafs-1.4.0-rc4). Is the need to call "sysctl -w
> > afs.GCPAGs=1" - when you don't want memleak - a bug, or it's by design?
>
> It's not actually a leak, and turning on GCPAGs is a performance tradeoff.
> Let me try to describe what's going on here...
>
>
> A PAG itself does not occupy any storage -- it's just a number, used to
> label processes which are members of that PAG, and also tokens,
> connections, and cached access rights belonging to that PAG.  It is these
> things which take up space.  These objects are cleaned up by background
> daemon which performs several checks at different intervals.
>
> Every three minutes, the background thread sweeps the token cache, looking
> for tokens which are expired or have been discarded.  Any in finds in this
> state are deleted (freeing the storage they occupy), as are cached access
> rights for the PAG containing them.
>
>
> Every ten minutes, the background thread does a sweep of all PAGs for which
> we have tokens or active Rx connections.  For any which either have no
> tokens or whose tokens have expired (within a short grace period), all
> connections are destroyed, and the structure used to track them is freed.
>
> So, after a short delay, no resources are used by a PAG whose tokens have
> expired or been deleted.  This is done reasonably efficiently, by
> traversing a list of data structures which exist only for active PAGs.
>
>
>
> The problem that comes up is in a situation where you frequently create a
> PAG, put some tokens in it, use it briefly, and then forget about it
> without bothering to delete its tokens.  The ideal thing to do here is to
> fix either the "frequently" or the "without bothering to delete its tokens"
> parts.  Lacking that, you can turn on the PAG garbage collector.
>
> When enabled, the garbage collector runs every hour.  What this does is
> scan the process table, setting an in-use flag on each PAG which has at
> least one process in it (*).  Then, it marks as expired the tokens of any
> PAG which has no members, which causes the sweeps described above to throw
> away those tokens after a short time.  This is not quite as racy as it
> sounds, but it does have the potential to miss a process and thus nuke
> tokens which are actually in use.  It's also a performance hit, and is
> completely unnecessary in the majority of cases.  Thus, it is not enabled
> by default.
>
>
> (*) Note that as mentioned above, PAGs don't actually occupy any storage.
> The flag described is actually set on the structure used to manage tokens
> and connections for a PAG, which exists only if there is anything to
> manage.
>
>
> -- Jeff
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel

-- 
Peter Somogyi
Software Developer, Gamax Ltd.
1114 Budapest, Bartok B. u 15/d
Tel.: +36-1-381-0544
e-mail: psomogyi@gamax.hu