Re: LLDB/NetBSD extended set of tasks

On 16 March 2017 at 21:43, Kamil Rytarowski <n54%gmx.com@localhost> wrote:
> On 16.03.2017 11:55, Pavel Labath wrote:
>> What kind of per-process events
>> are we talking about here?
>
> I'm mostly thinking about ResumeActions - to resume the whole process,
> while being able single-stepping desired thread(s).
>
> (We also offer PT_SYSCALL feature, but it's not needed right now in LLDB).
>
>> Is there anything more here than a signal
>> directed at the whole process?
>
> single-stepping
> resume thread
> suspend thread
>
> I'm evaluating FreeBSD-like API PT_SETSTEP/PT_CLEARSTEP for NetBSD. It
> marks a thread for single-stepping. This code is needed to allow us to
> combine PT_SYSCALL & PT_STEP and PT_STEP & emit signal.
>
> I was thinking about ResumeActions marking which thread to
> resume/suspend/singlestep, whether to emit a signal (one per global
> PT_CONTINUE[/PT_SYSCALL]) and whether to resume the whole thread.
>
> To some certain point it might be kludged with single-thread model for
> basic debugging.
>
>
> I imagined a possible flow of ResumeAction calls like:
> [Generic/Native framework knows upfront the image of threads within
> debuggee]
> - Resume Thread 2 (PT_RESUME)
> - Suspend Thread 3 (PT_SUSPEND)
> - Set single-step Thread 2 (PT_SETSTEP)
> - Set single-step Thread 4 (PT_SETSTEP)
> - Clear single-step Thread 5 (PT_CLEARSTEP)
> - Resume & emit signal SIGIO (PT_CONTINUE)
>
> In other words: setting properties on threads and pushing the
> PT_CONTINUE button at the end.

None of this is really NetBSD-specific, except the whole-process signal at the end (which I am going to ignore for now). I mean, the implementation of it is different, but there is no reason why someone would not want to perform the same set of actions on Linux for instance. I think most of the work here should be done on the client. Then, when the user issues the final "continue", the client sends something like $vCont;s:2;s:4;c:5. Then it's up to the server to figure out how execute these actions. On NetBSD it would execute the operations you mention above, while on linux it would do something like ptrace(PTRACE_SINGLESTEP, 2); ptrace(PTRACE_SINGLESTEP, 4); ptrace(PTRACE_CONTINUE, 5); (linux lldb-server already supports this actually, although you may have a hard time convincing the client to send a packet like that).

So I don't believe there will be any sweeping changes necessary to support this in the future. If I understand it correctly, you are working on the server now. All you need to do there is to make sure you translate the set of actions in the packet to the proper sequence of ptrace calls. You can even write lldb-server-style tests for that. Then, we can discuss what would be the best user-level interface to specify complex actions like this, and teach the client to send these packets.

>
>> AFAICT, most of the stop reasons
>> (breakpoint, watchpoint, single step, ...) are still linked to a
>> specific thread even in your process model. I think you could get to a
>> point where lldb is very useful even without getting these events
>> "correct".
>>
>
> I was thinking for example about this change (it's not following the
> real function name nor the prototype):
>
> GetStoppedReason(Thread) -> GetStoppedReason(Process,Thread)
>
> The Linux code would easily route it to desired thread and (Net)BSD
> return immediately the requested data. The need to have these functions
> in NativeThread (enforced by the framework) is the only purpose I keep
> them there, while there is global stopped reason on NetBSD (per-process).

Ok, I think we can talk about tweaks like that once you have something upstream. Right now it does not seem to me like that should pose a big development obstacle.

In my local code, I'm populating all threads within the tracee
(NativeThread) with exactly the same stop reason - for the "whole
process" case. I can see - on the client side - that it prints out the
same message for each thread within the process as all of them captured
a stop action.

Indeed, that can be a nuissance. The whole-process events is probably the first thing we should look at after the port is operational. I think this can be handled independently of the fancy resume actions we talk about above, which as Jim pointed out, would be very hard for users to comprehend anyway.

I'm evaluating it from the point of view of a tracee with 10.000 threads
and getting efficient debugging experience. This is why I would ideally
reduce NativeThread to a container that is sorted, hashale box of
integers (lwpid_t) and shut down stopped reason extension called for
each stopped in debuggee.

I wouldn't worry too much about the performance of this part of the code. If you get to the point where you debug a process with ten thousand threads, I think you'll find that there are other things which are causing performance problems.

On 16 March 2017 at 21:43, Kamil Rytarowski <n54%gmx.com@localhost> wrote:

On 16.03.2017 11:55, Pavel Labath wrote:
> On 16 March 2017 at 00:43, Kamil Rytarowski <n54%gmx.com@localhost> wrote:
>>
>> TODO:
>> - Fixing software breakpoints support,

Fixed!

267->596 of succeeded tests out of 1200+ - please scroll for details.

>> - Special Registers (Floating Point..) reading/writing,
>> - Unless it will be too closely related to develop threads - Hardware
>> watchpoints support in line with the Linux amd64 code,
>>
>>
>> As of today the number of passing tests has been degraded. This has been
>> caused due the fact that LLDB endeavors to set breakpoints in every
>> process (on the "_start" symbol) - this blocks tracing or simulating
>> tracing of any process.
> This is necessary so that we can read the list shared libraries loaded
> by the process and set any breakpoints in them. Note that currently
> (at least on Linux) we are doing it actually too late -- at this point
> the constructors in the shared libraries have already executed, so we
> cannot set breakpoints or debug the initialization code. I haven't yet
> investigated how to fix this.
>

I see.

It's interesting use-case; Right now I'm not sure how properly address it.

Thank you for your insight.

> We will need to discuss this in detail. I am not sure removing the
> NativeThreadNetBSD class completely will is a worthwhile goal, but we
> can certainly work towards making it's parent class dumber, and remove
> operations that don't make sense for all users. If e.g. your
> watchpoints are per-process, then we can pipe watchpoint setting code
> through NativeProcessProtocol, and NativeProcessNetBSD will implement
> that directly, while the linux version will delegate to the thread.
> However, even in your process model each thread has a separate set of
> registers, so I think it makes sense to keep the register manipulation
> code there.
>

I put all the threading potential challenges, each one will need to be
discussed. Refactoring is by definition cost and should be reduced to
minimum, while getting proper support on the platform. I think

Our watchpoints (debug registers) are per-thread (LWP) only.

>> - Support in the current thread function "0" (or "-1" according to the
>> GDB Remote protocol) to mark that the whole process was interrupted/no
>> primary thread (from a tracer point of view)
> Teaching all parts of the debugger (server is not enough, I think you
> would have to make a lot of client changes as well) about
> whole-process events might be a big task.

I think long term this might be useful. I noted in the GDB Remote
specification that this protocol is embeddable into simulators and
low-level kernel APIs without regular threads, however it's not urgently
needed to get aboard for standard user-level debugging facilities; while
it will be useful in the general set of capabilities in future.

> I wondering whether you
> wouldn't make more progress if you just fudged this and always
> attributed these events to the primary thread. I think we would be in
> a better position to design this properly once most of the debugger
> functionality was operational for you.

Agreed.

This is why the initial goal of mine is to get as far as possible
without touching the generic subsystems and get basic threading support.

> What kind of per-process events
> are we talking about here?

I'm mostly thinking about ResumeActions - to resume the whole process,
while being able single-stepping desired thread(s).

(We also offer PT_SYSCALL feature, but it's not needed right now in LLDB).

> Is there anything more here than a signal
> directed at the whole process?

single-stepping
resume thread
suspend thread

I'm evaluating FreeBSD-like API PT_SETSTEP/PT_CLEARSTEP for NetBSD. It
marks a thread for single-stepping. This code is needed to allow us to
combine PT_SYSCALL & PT_STEP and PT_STEP & emit signal.

I was thinking about ResumeActions marking which thread to
resume/suspend/singlestep, whether to emit a signal (one per global
PT_CONTINUE[/PT_SYSCALL]) and whether to resume the whole thread.

To some certain point it might be kludged with single-thread model for
basic debugging.

I imagined a possible flow of ResumeAction calls like:
[Generic/Native framework knows upfront the image of threads within
debuggee]
- Resume Thread 2 (PT_RESUME)
- Suspend Thread 3 (PT_SUSPEND)
- Set single-step Thread 2 (PT_SETSTEP)
- Set single-step Thread 4 (PT_SETSTEP)
- Clear single-step Thread 5 (PT_CLEARSTEP)
- Resume & emit signal SIGIO (PT_CONTINUE)

In other words: setting properties on threads and pushing the
PT_CONTINUE button at the end.

> AFAICT, most of the stop reasons
> (breakpoint, watchpoint, single step, ...) are still linked to a
> specific thread even in your process model. I think you could get to a
> point where lldb is very useful even without getting these events
> "correct".
>

I was thinking for example about this change (it's not following the
real function name nor the prototype):

GetStoppedReason(Thread) -> GetStoppedReason(Process,Thread)

The Linux code would easily route it to desired thread and (Net)BSD
return immediately the requested data. The need to have these functions
in NativeThread (enforced by the framework) is the only purpose I keep
them there, while there is global stopped reason on NetBSD (per-process).

> cheers,
> pl
>

Thank you for your response.

Last but not the least after getting software breakpoints to work the
obligatory Test Summary diff between:

http://netbsd.org/~kamil/lldb/check-lldb-r296360-2017-02-28.txt

and

http://netbsd.org/~kamil/lldb/check-lldb-r296360-2017-03-16.txt
(pkgsrc-wip/lldb-netbsd git rev. 2c9c8e7b56d)

===================
Test Result Summary
===================
-Test Methods: 1235
-Reruns: 1
-Success: 267
+Test Methods: 1240
+Reruns: 0
+Success: 596
Expected Failure: 21
-Failure: 332
-Error: 167
+Failure: 86
+Error: 91
Exceptional Exit: 0
Unexpected Success: 1
Skip: 444
-Timeout: 3
+Timeout: 1
Expected Timeout: 0