Subject: kern/36161: ptrace(2) & PT_SYSCALL does not stop before executing syscall
To: None <kern-bug-people@netbsd.org, gnats-admin@netbsd.org,>
From: None <ad@netbsd.org>
List: netbsd-bugs
Date: 04/17/2007 11:30:01
>Number: 36161
>Category: kern
>Synopsis: ptrace(2) & PT_SYSCALL does not stop before executing syscall
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Apr 17 11:30:01 +0000 2007
>Originator: Andrew Doran
>Release: NetBSD 4.99.17
>Organization:
The NetBSD Project
>Environment:
N/A
>Description:
From a report by njoly@:
I just noticed that tracing syscalls with ptrace(2) & PT_SYSCALL does
not seems to work as expected ... The debugged process seems only
stopped after executing a syscall, but not before.
I made the attached code to illustrate that problem (seen on -current
i386 and amd64). The same program, on FreeBSD/i386 6.1, show 2 ptrace
calls for each syscall as i expected.
njoly@hal [~]> uname -a
NetBSD hal.sis.pasteur.fr 4.99.17 NetBSD 4.99.17 (HAL) #2: Wed Apr 11 15:01:57 CEST 2007
njoly@hal.sis.pasteur.fr:/local/src/NetBSD/obj/i386/sys/arch/i386/compile/HAL i386
njoly@hal [~]> ./ptrace >/dev/null
syscall 0x0 (0).
syscall 0xbbbea000 (-1145135104).
syscall 0x3 (3).
syscall 0x0 (0).
syscall 0xbbbe9000 (-1145139200).
syscall 0x0 (0).
syscall 0x0 (0).
syscall 0x2 (2).
syscall 0x2 (2).
syscall 0x2 (2).
[...]
njoly@hal [~]> ktrace -di /bin/echo foo
foo
njoly@hal [~]> kdump | grep RET
7677 1 echo RET execve JUSTRETURN
7677 1 echo RET mmap -1145135104/0xbbbea000
7677 1 echo RET open 3
7677 1 echo RET __fstat30 0
7677 1 echo RET mmap -1145139200/0xbbbe9000
7677 1 echo RET close 0
7677 1 echo RET munmap 0
7677 1 echo RET open -1 errno 2 No such file or directory
7677 1 echo RET open -1 errno 2 No such file or directory
7677 1 echo RET open -1 errno 2 No such file or directory
[...]
>How-To-Repeat:
Nicholas provided a test case:
#include <sys/types.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <machine/reg.h>
#include <err.h>
#include <signal.h>
#include <stdio.h>
#include <unistd.h>
#ifdef __amd64__
#define REG regs[_REG_RAX]
#endif
#ifdef __i386__
#define REG r_eax
#endif
int main() {
int i, ret;
pid_t pid;
struct sigaction sa;
struct reg regs;
pid = fork();
switch (pid) {
case -1:
err(1, "fork failed");
case 0:
pid = getpid();
if (ptrace(PT_TRACE_ME, 0, 0, 0) < 0) {
err(1, "ptrace PT_TRACE_ME failed"); }
kill(pid, SIGSTOP);
execl("/bin/echo", "echo", "foo", NULL);
err(1, "execl failed");
default:
if (wait(&ret) != pid) {
err(1, "wait failed"); }
if (ptrace(PT_SYSCALL, pid, (char *)1, 0)) {
err(1, "ptrace PT_SYSCALL failed"); }
while (1) {
if (wait(&ret) != pid) {
err(1, "wait failed"); }
if (WIFEXITED(ret)) {
break; }
if (ptrace(PT_GETREGS, pid, ®s, 0) < 0) {
err(1, "ptrace PT_GETREGS failed"); }
fprintf(stderr, "syscall 0x%lx (%ld).\n", regs.REG, regs.REG);
if (ptrace(PT_SYSCALL, pid, (char *)1, 0) < 0) {
err(1, "ptrace PT_SYSCALL failed"); }
}
break; }
return 0; }
>Fix:
The issue here is that stopping is now always deferred until the LWP sleeps
interruptably or returns to userspace. That's so that any locks held over a
sleep can be released by the LWP before it comes to a halt.
I think the solution is to add a proc_stop_now() that checks for a request
to stop from the debugger, and makes it happen immediatley. That would be
called from process_stoptrace() in place of the call to mi_switch() that's
there now.