kern/39145: kernel malloc() leak when combining nullfs and Linux emulation

To: kern-bug-people%netbsd.org@localhost,gnats-admin%netbsd.org@localhost,netbsd-bugs%netbsd.org@localhost
Subject: kern/39145: kernel malloc() leak when combining nullfs and Linux emulation
From: he%nordu.net@localhost
Date: Mon, 14 Jul 2008 10:15:00 +0000 (UTC)

>Number:         39145
>Category:       kern
>Synopsis:       kernel malloc() leak when combining nullfs and Linux emulation
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jul 14 10:15:00 +0000 2008
>Originator:     Havard Eidnes
>Release:        NetBSD 4.0
>Organization:
        NORDUnet AS
>Environment:
System: NetBSD server.nordu.net 4.0 NetBSD 4.0 (SERVER) #2: Wed Jun 25 18:57:42 
CEST 2008  he%server.nordu.net@localhost:/usr/obj/sys/arch/i386/compile/SERVER 
i386
Architecture: i386
Machine: i386
>Description:

        There appears to be a memory leak related to malloc()ed memory
        in the kernel when nullfs mounts are used in combination with
        certain linux binaries, run under emulation.

        Case in point in our case: we use netbackup to take backups of
        this machine, which has a largish number of files:

server% df -ih
Filesystem       Size      Used     Avail Capacity  iused    ifree  %iused  
Mounted on
...
/dev/ld0g        172G      115G       48G    70%  2977222 41830968     6%   /a
...     
localhost:/a     172G      115G       48G    70%  2977247 41830943     6%   
/netbackup/a
...

        For now we have worked around the problem by instead using NFS
        mounts on 127.0.0.1 (as indicated above), so operationally for
        us this is no longer a major issue.  However, I wish to record
        my findings before I forget what I found out.

        We use alternate mount points for the data to be backed up to
        avoid the problem of the linux binary taking backups of the
        emulated linux tree when it wants to backup e.g. /usr/lib.

        A simple test involving "find . -type f | xargs cat >/dev/null"
        using NetBSD binaries on a null-mounted tree does not appear
        to trigger the problem.

        Investigation indicates that the leaked allocations are all
        8192 bytes in size.  I've modified the kernel malloc code to
        record how many allocations are done for each power-of-2 block
        size, and modified vmstat to print the information under
        "vmstat -m" (when KMEMSTATS is on), and have recorded the
        development of kernel mallocs in the "temp" category (where
        the leak appears to be).  An example:

Memory statistics by type                                Type  Kern
           Type InUse  MemUse HighUse   Limit   Requests Limit Limit Size(s)
...
           temp 13139 102221K 103220K 157287K  865753294     0     0 
16:9,32:7,64:3,128:0,256:1,512:39,1024:146,2048:171,4096:109,8192:12653,65536:1
...

        The memory which is leaked is being allocated done at
        line 198 in sys/miscfs/genfs/layer_subr.c:


        if ((error = getnewvnode(lmp->layerm_tag, mp, lmp->layerm_vnodeop_p,
                        &vp)) != 0)
                return (error);
        vp->v_type = lowervp->v_type;
        vp->v_flag |= VLAYER;

>>>     xp = malloc(lmp->layerm_size, M_TEMP, M_WAITOK);
        if (vp->v_type == VBLK || vp->v_type == VCHR) {
                MALLOC(vp->v_specinfo, struct specinfo *,
                    sizeof(struct specinfo), M_VNODE, M_WAITOK);
                vp->v_hashchain = NULL;
                vp->v_rdev = lowervp->v_rdev;
        }

        vp->v_data = xp;
        xp->layer_vnode = vp;
        xp->layer_lowervp = lowervp;
        xp->layer_flags = 0;


        This has been found by modifying the MALLOCLOG code to only
        record events related to the "temp" category, and to record
        actions related to malloc block sizes of 8192 bytes (luckily,
        malloc block sizes can be deduced for the "free" action as
        well), and by writing a small program to dump the MALLOCLOG
        ring buffer, and by post-processing the result (the
        refinements to narrow in on 8192-byte blocks came from the
        info below).

        If left unattended, this leak will eventually cause the kernel
        to run out of memory in kmem_map, which will either cause a
        panic or result in a hang (if KMEMSTATS is turned on).  (This
        hang in itself could be construed to be a bug in itself.)

        From looking at a plot of the development of the 8192-size
        blocks over time, it is clear that when we changed from null
        mounts to loopback NFS mounts, the growth in the number of
        8192-byte malloc()ed blocks in this category ceased to grow. 

        A summary of a partial ktrace the linux binary which performs
        the backups do when these leaks are accumulating shows the
        following number of system calls performed:

stat64          870
llseek          1425
time            9115
close           8843
read            10089
lstat64         439337
#271            7953
write           8527
brk             193
access          235
mmap2           9
getdents64      2962
fstat64         879
munmap          9
open            8842
chdir           1742
utime           7953
fcntl64         888

        I've tried to inspect the Linux compat code to find anything
        obvious being amiss, but sadly came up empty.  It's hard to
        find something which is missing (a missing free() somewhere,
        or a pointer being overwritten, or...).


>How-To-Repeat:
        Use a linux binary (in our case netbackup) to backup largish
        amounts of data on a null-mount.


>Fix:
        Sorry, don't know.

Prev by Date: Re: misc/39144: private /tmp not always initialized (sudo)
Next by Date: Re: bin/38493: no good audio with cdplay
Previous by Thread: misc/39144: private /tmp not always initialized (sudo)
Next by Thread: kern/39147: The Metageek Wispy 2.4x is attached as a hid device, but the usreland software wants it as a ugen device
Indexes:

Home | Main Index | Thread Index | Old Index