NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/41058: General software failures with kernels built for arm/NSLU2 kernels



I built NetBSD-5-0-1 using the patch Steve provided, which built with no issues. I booted the NSLU2 using the 5-0-1 kernel, and using the same world I used previously, ran same MD5 test that was used earlier. For those that don't remember, processes like MD5 on large files would return incorrect results roughly one time in 50.

After the patch, I got 540 correct answers out of 540. It appears that the problem has been corrected.

Thanks, Steve. Great work. And I, for one, believe you are the ideal person to invest more time tracking this down. ;-).

Regards,
Don

Steve Woodford wrote:
The following reply was made to PR kern/41058; it has been noted by GNATS.

From: Steve Woodford <scw%NetBSD.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: kern-bug-people%netbsd.org@localhost,
 gnats-admin%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost
Subject: Re: kern/41058: General software failures with kernels built for 
arm/NSLU2 kernels
Date: Tue, 27 Oct 2009 20:22:02 +0000

 --Boundary-00=_qZ15KW6mTKXovGn
 Content-Type: text/plain;
   charset="iso-8859-1"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: inline
I took a look at this bug last weekend after finally falling foul of it when updating one of my NSLU2s to netbsd-5. It looks like the ARMv6 integration exposed a cache sync corner case in pmap_activate(). The latter is supposed to flush the cache only when switching between different userland processes, but I believe there's a corner case related to process exit which also bypasses the flush when it ought not to. I've attached a patch relative to the netbsd-5 branch, but which will also apply to netbsd-current. The patch is intended to be temporary until someone (maybe me) invests more time in tracking down the corner case. Let me know if you're in a position to test the patch before I commit it. Steve --Boundary-00=_qZ15KW6mTKXovGn
 Content-Type: text/x-diff;
   charset="iso 8859-15";
   name="pmap.c.diff"
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
        filename="pmap.c.diff"
Index: pmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/arm/arm32/pmap.c,v
 retrieving revision 1.187
 diff -u -r1.187 pmap.c
 --- pmap.c     28 Sep 2008 21:27:11 -0000      1.187
 +++ pmap.c     27 Oct 2009 20:09:51 -0000
 @@ -3645,7 +3645,6 @@
            pg, VM_PAGE_TO_PHYS(pg), prot));
switch(prot) {
 -              return;
        case VM_PROT_READ|VM_PROT_WRITE:
  #if defined(PMAP_CHECK_VIPT) && defined(PMAP_APX)
                pmap_clearbit(pg, PVF_EXEC);
 @@ -4076,6 +4075,15 @@
         * entire cache.
         */
        rpm = pmap_recent_user;
 +
 +/*
 + * XXXSCW: There's a corner case here which can leave turds in the cache as
 + * reported in kern/41058. They're probably left over during tear-down and
 + * switching away from an exiting process. Until the root cause is identified
 + * and fixed, zap the cache when switching pmaps. This will result in a few
 + * unnecessary cache flushes, but that's better than silently corrupting data.
 + */
 +#if 0
        if (npm != pmap_kernel() && rpm && npm != rpm &&
            rpm->pm_cstate.cs_cache) {
                rpm->pm_cstate.cs_cache = 0;
 @@ -4083,6 +4091,16 @@
                cpu_idcache_wbinv_all();
  #endif
        }
 +#else
 +      if (rpm) {
 +              rpm->pm_cstate.cs_cache = 0;
 +              if (npm == pmap_kernel())
 +                      pmap_recent_user = NULL;
 +#ifdef PMAP_CACHE_VIVT
 +              cpu_idcache_wbinv_all();
 +#endif
 +      }
 +#endif
/* No interrupts while we frob the TTB/DACR */
        oldirqstate = disable_interrupts(IF32_bits);
--Boundary-00=_qZ15KW6mTKXovGn--


Home | Main Index | Thread Index | Old Index