Thanks for helping me out. Let me look at the changes you have made and I will continue working on improving it. I will also test it and get back as soon as possible.
Here is a modified version [1]. Compile-tested only, but it gives an idea.
I realized that there was a problem in your patch: you give a kcov descriptor
on a per-process basis, but if a proc has several threads you are mixing the
coverage of each thread in the output buffer. This is not desired, so I
changed the code to be per-LWP.
Also the lookup in _trace_pc() was bad for two reasons: it is slow, and it
is racy. The thing is, adding a mutex here is not allowed. So I dropped the
lookup and made a per-LWP pointer. As a result of that, it unfortunately
becomes more complicated to free the associated kcov descriptor, typically
in corner cases such as:
- A thread closes the file descriptor in the process, but another thread
was being traced; in that case we ask the traced thread to free the kcov
descriptor when it terminates
- A traced thread is killed; in that case it should disable the kcov
descriptor but not free it, and this descriptor will get subsequently
freed when the process terminates
Another concern (that I didn't address) is the fact that you use device
units. It would seem better to me if it was not per-unit but per-process.
Ie, when a process opens /dev/kcov, you register the pid in the kcov
descriptor rather than the unit. Then you look up based on the pid, and
you trace based on the thread.
So, do as you see fit, my version shouldn't be too far from working, and
it's here to give an idea.
[1] https://m00nbsd.net/garbage/kcov/kcov.diff