Subject: Re: kern/34959
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: Julio M. Merino Vidal <jmmv@NetBSD.org>
List: netbsd-bugs
Date: 11/01/2006 18:10:03
The following reply was made to PR kern/34959; it has been noted by GNATS.
From: "Julio M. Merino Vidal" <jmmv@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/34959
Date: Wed, 1 Nov 2006 18:06:33 +0000
Upon further investigation, I have found that the panic appears whenever
the NFS server decides to call uvm_loanuobjpages, which in turn uses
tmpfs_getpages. As a result, an easier way to trigger the error is to
copy, e.g., an executable file on the exported file system and then try
to execute it from within the NFS mount point. Following my previous
example:
$ cp /bin/cp /mnt/tmpfs
$ cd /mnt/remote
$ ./cp
<< machine crashes >>
I've been looking at several parts of the code and I suspect the problem
is somewhere in the tmpfs_getpages function, maybe because it does not
handle some corner case or something like that. Maybe I'm completely
wrong and the problem is not there.
Anyway. Let's assume for a moment that tmpfs_getpages is correct and
that it needn't handle that corner case. Then: uvm_loanuobjpages is
called from nfsrv_read. That routine does the following:
m = m_get(M_WAIT, MT_DATA);
MCLAIM(m, &nfs_mowner);
pgpp = m->m_ext.ext_pgs;
error = uvm_loanuobjpages(&vp->v_uobj, pgoff, npages, pgpp);
I'm not sure at all, but this feels incorrect. The code is passing
the m_ext.ext_pgs to the uvm_loanuobjpages function, yet it seems to be
uninitialized; at least, M_EXT_PAGES is not set in m->m_flags and
ext_pgs[0] is always 0x30000000. This supposedly-bogus value is later
handed off to auo_get, which, seeing that the page pointer is not NULL,
assumes it is valid and does not fetch it (see uvm_aobj.c 1.81 around
line 1022). As a result, the rest of the uvm_loanuobjpages handles
this invalid pointer and ends up crashing. Is this a bug? Note that
it will be exposed whenever that NFS code path ends up calling
auo_get without PGO_LOCKED (because it will enter step 2).
For now I've been able to workaround the problem by making the
nfsrv_read function initialize the mbuf's ext_pgs field to NULL
pointers and later passing PGO_ALLPAGES to the pgo_get call in
uvm_loanuobjpages. The former may be fine, but the latter is most
likely not.
Oh, and now I've just tried to implement the PGO_LOCKED handling case
in tmpfs_getpages (based on genfs_getpages) and it hasn't solved the
problem. I thought it would because normal operation does not use
auo_get's step 2, but rather terminates quickly in step 1.
--
Julio M. Merino Vidal <jmmv@NetBSD.org>