I don't have an immediate answer for the original questions, however... On Sat, 3 Nov 2012 04:04:58 +0200 Jukka Ruohonen <jruohonen%iki.fi@localhost> wrote: > Seriously -- while the old saying goes that all tests should be as > high quality as the production code -- I am not sure we can follow > this fine principle with tests(7). I can only assume that e.g. > inducing root to run the tests would reveal numerous potential > vulnerabilities. Yet, several of the reproducable bugs we've found > are only available to the root, even with rump and all. Actually, tests need to be even better than production code. The reason for this is that ANY test which ever produces the wrong result is not really a test at all. False positives mean regressions being missed. False negatives mean wasted time and energy looking for a bug in the wrong place, or else slipping into a habit of ignoring them. On Fri, 2 Nov 2012 18:45:43 -0700 (PDT) Paul Goyette <paul%whooppee.com@localhost> wrote: > Is there some clean way to force preemption, even on SMP systems with > lots of cores? (On my 24-core machine, the tests succeed less than > half the time under normal system load.) And this example is a perfect one. The man page for tests(7) states “If there is _any failure_ during the execution of the test suite, please considering reporting it to the NetBSD developers so that the failure can be analyzed and fixed.” (emphasis in original) It's one thing if someone who knows the code/tests can spot a spurious false negative and ignore it (although, how do you tell between an expected false negative and a genuine heisenbug? - more on this in my next paragraph) but end users certainly can't be expected to, especially when instructed to report any failure. An example from my professional experience: A certain test would sometimes fail in automated testing but much more often pass. For a while it was written off by devs as a race or other bug in the test (release engineers often get blamed!) However, as this problem persisted a pattern was spotted: the test failed when executed on certain machines in the farm. Eventually it was tracked down to a difference in filesystem semantics; a few nodes used a different FS and the code made an assumption that wasn't true in all cases. There really was a bug triggering the intermittent failure. And what's more, if the test farm had been homogeneous and all used the same setup, it probably wouldn't have been spotted. So it's not about whether faulty tests constitute a vulnerability (though I'm sure this is a concern for some), but rather whether we can deduce from a test failure that there is a bug to find. Any assumption in the test which might not hold true in all cases is one which could mask an assumption in the code being tested, and there may be no way to distinguish between the two. And any race condition in the test could mask a race in the code (et cetera). If one person's specific setup causes a test to fail, then anything in the test which could be causing the failure needs to be eliminated to see if that setup also causes problems in the code. That's my 2p worth, anyway. Julian -- 3072D/F3A66B3A Julian Yon (2012 General Use) <pgp.2012%jry.me@localhost>
Attachment:
signature.asc
Description: PGP signature