On Thu, Oct 27, 2011 at 07:24:03 +-300, Jukka Ruohonen wrote: > > This is a well-known bug that is over 15 year old. The much simpler tests in > atf(7) replicate it well. The used tracker PR is kern/30115. Michael van > Elst suggested therein couple of reasonable (IMO) solutions. Part of the point of this new discussion is that I am attempting, perhaps poorly, to show that I think PR#30115 and its historical counterpart, and similar reports in the PR databases of other *BSDs, represent a separate, unique, problem. It is possible that the problem I'm trying to show here shares, or is at least related to, the same cause as the problem shown in PR#30115. That's part of what I'm trying to discover here. However FreeBSD's solution to PR#30115 is not in any way a valid solution to the problem I'm trying to show here, regardless of whether the problem I'm trying to show has the same cause or not. That solution will prevent the little wobbles that the simplistic tests demonstrate, but it won't make overall getrusage() timing results any more meaningful and consistent. Indeed it may even make them a wee bit more wrong, though I'm not sure this last part matters so much. From what I understand currently, especially if the root cause of these problems is related, then David's proposed solution would be on the right track: On Fri, 28 Oct 2011 08:48:19 +0100, David Laight wrote: > > If you are willing to take the cost of getting the timestamp (in > some units) on every kernel entry/exit (as well as the process switch) > then the time in usr/sys can be added to the clock tick counts and > used when the actual execution time is split. > (Doing it that way means the units don't have to be THAT accurate) Hmmm.... if we could save the current time on every kernel entry, and then increment a new "l_systime" variable with the elapsed time on every return to user mode, and of course use the same clock as is used for l_rtime (i.e. binuptime()), then the only wild-card variable left is interrupt time. Just how expensive is updatertime() and the associated bookkeeping it needs? Hmmm.... So, then user time would be the difference between the sum of thread runtimes and the sum of thread systimes, less some value for interrupt time. Ideally interrupt time would also be measured similarly (using the same clock again) by the interrupt dispatcher and accumulated against whatever thread (kernel or user) was interrupted (e.g. in l_intrtime). However I don't quite see how this could be possible to do safely, especially in conjunction with SMP, though I'm not familiar enough with the details of the locking that might be required to know for sure. If I'm wrong and it is possible to do then directly measuring and accounting for interrupt time would also be a very good thing, (assuming it wouldn't be so costly as to radically change overall system performance). In any case with the current state of affairs I'm beginning to think the interrupt ticks are the real wild-cards here and I'm wanting to modify getrusage() to return a new ru_itime value as well (or add a new system call to return the raw p_rtime and p_*ticks values along with stathz). After all, how likely is it that the average of time accounted to p_iticks will actually match the true time used by interrupts. I'm guessing average interrupt service times are far less than stathz intervals. I'm also wondering if I can force "stathz=0" at runtime, perhaps with a sysctl, so that I can also avoid the perturbations caused by having a different (and possibly changing) statclock rate. It's all well and good to try to reduce the cost of statclock handling by giving it a much lower rate than hardclock, but in the end that just makes the division of p_rtime as returned by getrusage() effectively meaningless, and thus some of the work done by statclock may as well be simply not done at all in the first place when stathz is non-zero. It would be much less misleading, to say the least. -- Greg A. Woods Planix, Inc. <woods%planix.com@localhost> +1 250 762-7675 http://www.planix.com/
Attachment:
pgpoJJbbsSQJJ.pgp
Description: PGP signature