NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Subject: Re: port-xen/45975: panic: HYPERVISOR_mmu_update failed, ret: -22 on -current MP domU (amd64)



Hi Manuel,


> I've made progress on this; I think I understood what's going on and
> I have a fix.
>
>
> The page is inded still used as a page table; it's still in another CPU's
> ci_kpm_pdir. The reason is that xen_kpm_sync() is not working as expected,
> leading to races between CPUs.
> 1 the check (xpq_cpu != &x86_curcpu) is always false because we
>   have different x86_curcpu symbols with different addresses in the kernel.
>   Fortunably, all addresses dissaemble to the same code.

Ok, this is my messup - I was looking for a way to not use %gs/%fs
before they are setup via cpu_init_msrs(). The reason it doesn't work,
as dh pointed out elsewhere, is because x86_curcpu() is defined in a
header as static inline.




>
> 3 pmap->pm_cpus is not safe for the purpose of xen_kpm_sync(), which
>   needs to know on which CPU a pmap is loaded *now*:
>   pmap->pm_cpus is cleared before cpu_load_pmap() is called to switch
>   to a new pmap, leaving a window where a pmap is still in a CPU's
>   ci_kpm_pdir but not in pm_cpus. As a virtual CPU may be preempted
>   by the hypervisor at any time, it can be large enough to let another
>   CPU free the PTP and reuse it as a normal page.
>

Makes sense.

>
> To fix 2) I choose to avoid cross-calls and IPIs completely, and instead
> use a mutex to update all CPU's ci_kpm_pdir from the local CPU.
> It's safe because we just need to update the table page, a tlbflush IPI will
> happen later. As a side effect, we don't need a different code for bootstrap.
>
>
> to fix 3), I introduced a pm_xen_ptp_cpus which is updated from
> cpu_pmap_load(), whith the ci_kpm_mtx mutex held. Checking it with
> ci_kpm_mtx held will avoid overwriting the wrong pmap's ci_kpm_pdir.
>
>
> While there I removed the unused pmap_is_active() function;
> and added some more details to DIAGNOSTIC panics.
>

Can we add these broken down into separate patch sets ?

>
> The attached patch implements this; it has been tested on 4-CPUs domU
> (on a physical dual-core box, so the race described in 2) is more likely to
> happen) and I couldn't trigger the panic any more (a build.sh -j8 release
> would never complete before for me).
>

...

> @@ -387,68 +360,30 @@ pmap_kpm_sync_xcall(void *arg1, void *ar
>  void
>  xen_kpm_sync(struct pmap *pmap, int index)
>  {
> -      uint64_t where;
> +      CPU_INFO_ITERATOR cii;
> +      struct cpu_info *ci;
>
>
>        KASSERT(pmap != NULL);
>
>
>        pmap_pte_flush();
>
>
> -      if (__predict_false(xpq_cpu != &x86_curcpu)) { /* Too early to xcall */
> -              CPU_INFO_ITERATOR cii;
> -              struct cpu_info *ci;
> -              int s = splvm();
> -              for (CPU_INFO_FOREACH(cii, ci)) {
> -                      if (ci == NULL) {
> -                              continue;
> -                      }
> -                      if (pmap == pmap_kernel() ||
> -                          ci->ci_cpumask & pmap->pm_cpus) {
> -                              pmap_kpm_setpte(ci, pmap, index);
> -                      }
> +      for (CPU_INFO_FOREACH(cii, ci)) {
> +              if (ci == NULL) {
> +                      continue;
>                }
> -              pmap_pte_flush();
> -              splx(s);
> -              return;
> -      }
> -
> -      if (pmap == pmap_kernel()) {
> -              where = xc_broadcast(XC_HIGHPRI,
> -                  pmap_kpm_sync_xcall, pmap, &index);
> -              xc_wait(where);
> -      } else {
> -              KASSERT(mutex_owned(pmap->pm_lock));
> -              KASSERT(kpreempt_disabled());
> -
> -              CPU_INFO_ITERATOR cii;
> -              struct cpu_info *ci;
> -              for (CPU_INFO_FOREACH(cii, ci)) {
> -                      if (ci == NULL) {
> -                              continue;
> -                      }
> -                      while (ci->ci_cpumask & pmap->pm_cpus) {
> -#ifdef MULTIPROCESSOR
> -#define CPU_IS_CURCPU(ci) __predict_false((ci) == curcpu())
> -#else /* MULTIPROCESSOR */
> -#define CPU_IS_CURCPU(ci) __predict_true((ci) == curcpu())
> -#endif /* MULTIPROCESSOR */
> -#if 0 /* XXX: Race with remote pmap_load() */
> -                              if (ci->ci_want_pmapload &&
> -                                  !CPU_IS_CURCPU(ci)) {
> -                                      /*
> -                                       * XXX: make this more cpu
> -                                       *  cycle friendly/co-operate
> -                                       *  with pmap_load()
> -                                       */
> -                                      continue;
> -                                  }
> -#endif /* 0 */
> -                              where = xc_unicast(XC_HIGHPRI, 
> pmap_kpm_sync_xcall,
> -                                  pmap, &index, ci);
> -                              xc_wait(where);
> -                              break;
> -                      }
> +              if (pmap != pmap_kernel() &&
> +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) == 0)
> +                      continue;
> +
> +              /* take the lock and check again */
> +              mutex_enter(&ci->ci_kpm_mtx);
> +              if (pmap == pmap_kernel() ||
> +                  (ci->ci_cpumask & pmap->pm_xen_ptp_cpus) != 0) {
> +                      pmap_kpm_setpte(ci, pmap, index);

Isn't a tlb shootdown needed after this, to make sure the old pte is not
referred to in the TLB ?

Separately, please note that the tlb shootdown code ignores requests to
shootdown when the pmap is being destroyed. 

http://nxr.netbsd.org/xref/src/sys/arch/x86/x86/pmap_tlb.c#222

I have a hypothesis that stale TLB entries involving PTPs in the
ci_kpm_pdir[] may be responsible for the "other" bug that riz@ is seeing
?

Cheers,

Cherry.


Home | Main Index | Thread Index | Old Index