( NetBSD 8.99.38 or newer required ) ( these tools are experimental and can kill a traced process ) I wrote an experimental version of NetBSD truss(1), modeled after FreeBSD truss(1). https://github.com/krytarowski/picotrace It was started as a fork of the FreeBSD one, but quickly became a full-rewrite as the decoding part was specific to the FreeBSD kernel and the ptrace(2) usage was FreeBSD-only as well. There was not much for code sharing. I've picked picotrace, forked and extended to mimic FreeBSD truss(1). NetBSD truss(1) indents to implement all the features of FreeBSD truss(1) and extend it whenever useful with additional features. At this point of time all the features should be implemented and committed to the tree, but not all of them are reliable. I've extended the tool to handle SIGINFO for the counting mode (-c) and printing AUXV for tracing & printing events mode (-x). Example output: 1. ./truss -o /tmp/build.sh-2019-05-24.txt -axe -f -H -d -D /usr/src/build.sh http://netbsd.org/~kamil/truss/build.sh-2019-05-24.txt.bz2 $ bzcat build.sh-2019-05-24.txt.bz2 |head 9267 1 0.000715337 0.000715337 mmap(0x0, 32768, 0|PROT_READ|PROT_WRITE, 0|MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)= 0xf7ef7000 9267 1 0.000897229 0.000120463 open("/etc/ld.so.conf", O_RDONLY) = 0 Err#2 ENOENT 9267 1 0.000980901 0.000081141 open("/usr/local/lib/libc.so.12", O_RDONLY) = 0 Err#2 ENOENT 9267 1 0.001057840 0.000074579 open("/usr/lib/libc.so.12", O_RDONLY) = 0x3 9267 1 0.001135284 0.000075166 __fstat50(3, { dev=0, mode="-r--r--r-- ", ino=12401082, nlink=1, uid=0, gid=0, rdev=-1, atime=1558660527.138286790, mtime=1558420316.000000000, ctime=1558425703.025156636, blksize=32768, blocks=4544})= 0 9267 1 0.001161701 0.000024142 mmap(0x0, 4096, 0|PROT_READ, 0|MAP_SHARED, 3, 0) = 0xf7ef6000 9267 1 0.001192443 0.000028834 munmap(0x7f7ff7ef6000, 4096) = 0 9267 1 0.001215649 0.000021266 mmap(0x0, 6373376, 0|PROT_EXEC|PROT_READ, 0|MAP_PRIVATE|MAP_ALIGNED(0x15), 3, 0)= 0xf7400000 9267 1 0.001244366 0.000026852 mmap(0x7f7ff77df000, 69632, 0|PROT_READ|PROT_WRITE, 0|MAP_PRIVATE|MAP_FIXED, 3, 1961984)= 0xf77df000 9267 1 0.001271844 0.000025604 mmap(0x7f7ff77f0000, 2244608, 0|PROT_READ|PROT_WRITE, 0|MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0)= 0xf77f0000 2. ./truss -c `pgrep firefox` http://netbsd.org/~kamil/truss/firefox-count-2019-05-24.txt truss[9968] running for 61.849759335 seconds truss[9968] attached to child=10769 ('firefox') for 61.848690315 seconds syscall seconds calls errors missed-sce missed-scx read 0.254737948 1947 0 790 716 write 0.811738104 1418 0 643 934 open 0.137601862 134 26 77 54 close 0.005132063 89 0 42 43 unlink 0.047976024 29 0 30 5 getpid 3.165382267 7359 0 514 8200 getuid 0.000451950 7 0 5 41 Observations: 1. We can observe in both examples that there is a performance issue in build.sh, as something attempts to call 5000 times read(2) for 1 byte only. I plan to extend NetBSD truss(1) to print program name next to PID number. 2. Firefox and Thunderbird and certainly other similar software calls excessively gettimeofday() and clock_gettime(). At least around 100k times per 1 minute, and the program spends around 30sec (cumulative time from all LWPs in a process) in the kernel space prompting for the current time. __gettimeofday50 2.158735930 17989 0 963 4141 __clock_gettime50 21.415106767 73068 0 7484 22300 There is also a bug in the NetBSD kernel that whenever we catch more than 1 event in the same time, one of them is registered and others lost. This affects multi-threading tracing. This affects all debuggers with support debugging of programs with multiple threads (and likely multiple processes). I've discussed with Jason Thorpe how to optimize this and it shall be put into a dedicated thread. I have noted few other ptrace(2) and kernel behavior issues and I put them on my TODO list and I will work on fixing them. The bottlenek NetBSD coredumper(1) Some applications mask crash signals and implement homegrown fallback mechanisms for breakage (vim, firefox, Xorg and whatever tries to be smart). In the past, I used to use this core(5) dumper for firefox, removing signal handlers for interesting signals: http://netbsd.org/~kamil/firefox-dumper/custom_sigaction.c Now, with coredumper(1), this workaround is no longer needed. With recent kernel changes crash signals are correctly reported to debuggers always. regardless of signal masking in processes. I learnt during BSDCan that people in some other OSs keep removing signal handlers manually in order to debug bugs.. NAME coredumper – crash detector and core dumper SYNOPSIS coredumper [-c CORENAME] [-i] [-p | command [arg ...]] Whenever program triggers crash, coredumper(5) catches this and creates a core file. I plan to extend this program with functionality to print backtraces instead of dumping core files that can be too large in some circumstances. My recent motivation to get this program done was investigating X crashes, but they are extra tricky as X uses privileged processes that are harder to debug. Full example: http://netbsd.org/~kamil/coredumper/example-2019-05-24.txt In practice a similar behavior to truss(1) and coredumper(1) could be achieved with an existing debugger (especially with Python/similar bindings), however the purpose of these new programs is to perform one single task in an easier way to setup. ktruss(1) is similar to truss(1), but uses kernel-based tracing. truss(1) uses ptrace(2) and is more flexible in terms of prompting the traced process for whatever information we are interested in a child process. NetBSD truss(1) and NetBSD coredumper(1) are intended to be completely MI (except LP64 check). For now, I'm switching back to fixing kernel ptrace(2) behavior in order to make such tools (and consequently debuggers like GDB/LLDB) more reliable. truss(1) and picotrace(1) exposed some new issues and helped me to understand why tracing programs with multiple LWPs is still not reliable.
Attachment:
signature.asc
Description: OpenPGP digital signature