Subject: Re: keeping the process start time in core at all times...
To: David Laight <david@l8s.co.uk>
From: Greg A. Woods <woods@weird.com>
List: tech-kern
Date: 11/29/2002 14:08:10
[ On Friday, November 29, 2002 at 18:08:22 (+0000), David Laight wrote: ]
> Subject: Re: keeping the process start time in core at all times...
>
> Why is the start time so important to you?
When you manage lots of long-running servers the process start time is a
critical bit of information for diagnosing all kinds of problems.
I was completely flabbergasted when I first ran "ps -u" on NetBSD-1.5W
and discovered hyphens in the "STARTED" column. I had been relying on
this information for diagnosing problems ever since I first began making
use of it on something like UNIX System III many years ago.
In sysIII the start time was in struct user, but in sysIII the 'ps'
command did grovel around on the swap device to find the 'u' area (if it
had to -- i.e. if the invoker wanted the "full" (-f) display). It's
still there in SysVr4 (i.e. in struct user), but in SysVr4
/proc/*/psinfo returns a struct psinfo which contains a pr_start field
and thus one way or another the kernel makes sure this information is
available to 'ps'.
I damn near fixed this in my own source tree but I've been too lazy to
maintain the changes (and I'm also still not exactly sure how/why it
went away and whether or not it can be restored without actually moving
p_start from struct pstats).
> You don't often need values from the non-resident process.
> Certainly not from ones that haven't run for a while.
> (I've looked at enough kernel dumps in my time...)
When I'm examining a kernel dump I'm not necessarily debugging just a
dead kernel, but rather a whole production system. I need to know what
processes were running and when they started, who they were running as,
sometimes how much CPU time they used, and perhaps even what they were
waiting on at the time of the crash, and so on. I need to know this
information not to fix the kernel bug (that's secondary in these
scenarios), but rather to find out what else needs fixing in the
production environment that the bug has damaged.
--
Greg A. Woods
+1 416 218-0098; <g.a.woods@ieee.org>; <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>