Re: lib/55719 (Unwind tables for signal trampoline on amd64 are incorrect)

To: kamil%netbsd.org@localhost, gnats-admin%netbsd.org@localhost, netbsd-bugs%netbsd.org@localhost, nikhil.benesch%gmail.com@localhost
Subject: Re: lib/55719 (Unwind tables for signal trampoline on amd64 are incorrect)
From: Nikhil Benesch <nikhil.benesch%gmail.com@localhost>
Date: Mon, 2 Nov 2020 04:55:02 +0000 (UTC)

The following reply was made to PR lib/55719; it has been noted by GNATS.

From: Nikhil Benesch <nikhil.benesch%gmail.com@localhost>
To: gnats-bugs%netbsd.org@localhost, kamil%netbsd.org@localhost, gnats-admin%netbsd.org@localhost,
 netbsd-bugs%netbsd.org@localhost
Cc: 
Subject: Re: lib/55719 (Unwind tables for signal trampoline on amd64 are
 incorrect)
Date: Sun, 1 Nov 2020 23:54:34 -0500

 On Mon, Oct 19, 2020 at 7:45 AM Kamil Rytarowski <n54%gmx.com@localhost> wrote:
 > Â While there, I'm getting the following backtrace for NetBSD/i386:
 >
 > Â Backtrace 6 stack frames.
 > Â 0x8048972 <handler+0x2d> at ./a.out
 > Â 0xfbd93010 <__sigtramp_siginfo_2> at /usr/lib/i386/libc.so.12
 > Â 0x804899e <run1> at ./a.out
 > Â 0x80489a5 <run3> at ./a.out
 > Â 0x80489aa <run3+0x5> at ./a.out
 > Â 0x80489de <__x86.get_pc_thunk.ax> at ./a.out
 >
 > Â for "gcc -fomit-frame-pointer Â -O3 test.c -lexecinfo -m32". Is this
 > Â correct? I don't see <main> over there.
 >
 > Â http://netbsd.org/~kamil/backtrace/libc-i386-PR55719-multiple-frames.txt
 
 Sorry for the slow reply. I finally got the chance to look into this and I think
 figured out what is going on. It is a classic case of GCC being too clever. Here
 is the disassembly of your test program:
 
 0804899e <run1>:
   804899e:       eb fe                   jmp    804899e <run1>
 
 080489a0 <run2>:
   80489a0:       e8 f9 ff ff ff          call   804899e <run1>
 
 080489a5 <run3>:
   80489a5:       e8 f6 ff ff ff          call   80489a0 <run2>
   80489aa:       66 90                   xchg   %ax,%ax
   80489ac:       66 90                   xchg   %ax,%ax
   80489ae:       66 90                   xchg   %ax,%ax
   80489b0:       e8 eb fa ff ff          call   80484a0 <abort@plt>
   80489b5:       89 fb                   mov    %edi,%ebx
   80489b7:       e8 e4 fa ff ff          call   80484a0 <abort@plt>
 
 080489bc <main>:
   80489bc:       55                      push   %ebp
   80489bd:       89 e5                   mov    %esp,%ebp
   80489bf:       83 e4 f0                and    $0xfffffff0,%esp
   80489c2:       83 ec 10                sub    $0x10,%esp
   80489c5:       c7 44 24 04 45 89 04    movl   $0x8048945,0x4(%esp)
   80489cc:       08
   80489cd:       c7 04 24 02 00 00 00    movl   $0x2,(%esp)
   80489d4:       e8 e7 fa ff ff          call   80484c0 <signal@plt>
   80489d9:       e8 c7 ff ff ff          call   80489a5 <run3>
 
 080489de <__x86.get_pc_thunk.ax>:
   80489de:       8b 04 24                mov    (%esp),%eax
   80489e1:       c3                      ret
 
 Notice how none of the functions have ret instructions. I guess GCC has realized
 that all of them terminate in the infinite loop in run1. So when the unwinder
 tries to unwind from run3, it ends up falling off the end of main and into
 the __x86.get_pc_thunk.ax function instead, due to that (ra + 1) hack we were
 discussing previously.
 
 A very simple change to inhibit this optimatization...
 
 diff -u test.c test2.c
 --- test.c      2020-11-02 04:49:00.693887594 +0000
 +++ test2.c     2020-11-02 04:48:33.693998384 +0000
 @@ -14,10 +14,12 @@
           backtrace_symbols_fd (array, size, 2);
   }
   
 +volatile int v;
 +
   __attribute__ ((noinline))
   int
   run1(void) {
 -       for (;;)
 +       for (v = 1; v;)
                   continue;
   }
   
 ...results in the correct backtraces:
 
 $ ./a.out
 ^Cx 2
 Backtrace 4 stack frames.
 0x8048972 <handler+0x2d> at ./a.out
 0xf4c3c0c0 <__sigtramp_siginfo_2> at /usr/lib/i386/libc.so.12
 0x80489a8 <run1+0xa> at ./a.out
 0x80489ee <main+0x22> at ./a.out
 
 And the assembly, as you would expect, includes the ret instructions:
 
 080489cc <main>:
   80489cc:       55                      push   %ebp
   80489cd:       89 e5                   mov    %esp,%ebp
   80489cf:       83 e4 f0                and    $0xfffffff0,%esp
   80489d2:       83 ec 10                sub    $0x10,%esp
   80489d5:       c7 44 24 04 45 89 04    movl   $0x8048945,0x4(%esp)
   80489dc:       08
   80489dd:       c7 04 24 02 00 00 00    movl   $0x2,(%esp)
   80489e4:       e8 d7 fa ff ff          call   80484c0 <signal@plt>
   80489e9:       e8 c6 ff ff ff          call   80489b4 <run3>
   80489ee:       31 c0                   xor    %eax,%eax
   80489f0:       c9                      leave
   80489f1:       c3                      ret
 
 I'm not sure there is anything that can be done about this. This seems like a flaw
 inherent to the design of the DWARF-based unwinders. At the very least, analyzing
 and fixing this properly exceeds my expertise.
 
 Thanks again for getting the earlier unwinding patches committed so quickly,
 Kamil. gccgo is working great on NetBSD now as a result.
 
 Cheers,
 Nikhil

Prev by Date: Re: lib/55719 (Unwind tables for signal trampoline on amd64 are incorrect)
Next by Date: Re: kern/55774: Kernel panics if another PC is powered on with serial cable attached between two
Previous by Thread: Re: lib/55719 (Unwind tables for signal trampoline on amd64 are incorrect)
Next by Thread: lib/55719: Unwind tables for signal trampoline on amd64 are incorrect
Indexes:

Home | Main Index | Thread Index | Old Index