tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Anomalies while handling p_nstopchild count



One more question:

In kern/kern_exit.c at (original) line 231, there's a call to
membar_producer(), presumably to ensure that p->p_waited is globally
visible before updating p_.p_stat (which might cause someone to look
at p->p_waited).

Now that I'm planning to protect this whole area using proc_lock, I
question whether or not the potentially expensive call to
membar_producer() is still needed?



On Sun, 11 Oct 2015, Paul Goyette wrote:

On Sat, 10 Oct 2015, Taylor R Campbell wrote:

  Date: Sat, 10 Oct 2015 16:50:42 +0800 (PHT)
  From: Paul Goyette <paul%vps1.whooppee.com@localhost>

  While continuing to track down the zombie-that-would-not-die I managed
  to find two more places where a process's p_stat and its parent's count
  of children to wait for (p_nstopchild) get out of sync.  The additional
  issues are documented in PR kern/50308 and kern/50318.

  With fixes for all four of these PRs in my local kernel, the zombie
  problem seems to have disappeared, and no other ill effects have been
  seen.  I have confirmed that at least kern/50300 was being seen in my
  local system, and correlated with the appearance of the long-lived
  zombie;  kern/50298 and kern/50308 have not been specifically observed.

Based on the analysis I just sent to one of PR 50318 (not noticing
until I was done that it applied to all four of them), the four
patches look good to me.  Please commit them separately, with a brief
analysis and PR reference in each one, so we have a chance of
bisection if anything goes wrong.

Thanks for looking, and for providing the formal analysis.  kre and I
had done pretty much the same investigation, albeit less formally.

I'll let the patches run for a while in my local code before I commit (and request pull-ups to NetBSD-7).

We also ought to add automatic tests for proc.12.stop{exec,exit,fork},
since the code for them looks fishy and is likely seldom exercised.

Yeah.  I'll try to figure out how to test this stuff.  You're right,
these code paths appear to be rarely exercised.


+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+


+------------------+--------------------------+-------------------------+
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| (Retired)        | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Kernel Developer | 0786 F758 55DE 53BA 7731 | pgoyette at netbsd.org  |
+------------------+--------------------------+-------------------------+


Home | Main Index | Thread Index | Old Index