On 03.05.2017 23:10, Christos Zoulas wrote: > On May 3, 10:25pm, n54%gmx.com@localhost (Kamil Rytarowski) wrote: > | B. > | I'm still verifying single stepping of LWPs in processes with multiple > | threads. I have an impression that something is fragile there. > > Ok. Let me know when you have a reproducible problem... > The problem looks similar to PT_RESUME and PT_SUSPEND (per-LWP operations). With multiple LWPs after creation of a thread followed by raising a signal for the tracer, a process cannot be singlestepped as one thread apparently never starts or dies (?) and _lwp_wait() (for reasonable value of lwpid_t: 2) returns EDEADLK. _lwp_makecontext() _lwp_create() raise(signal) _lwp_wait() This is not restricted to PT_SETSTEP, the same happens with PT_STEP. I will go into this rabbit hole and debug it till squashing the bug. It will take a while, but getting understanding what's going on is beneficial (besides profit of just correcting it). There was filed another report for PT_RESUME... there is tension from the community: "Several ptrace_wait test cases fail under DEBUG+LOCKDEBUG" http://gnats.netbsd.org/52213 > | C. > | LLDB tests trigger dmesg errors (default GENERIC kernel), there are > | entries like: > | fill_vmentry: vp 0xfffffe87288967e8 error 2 > | fill_vmentry: vp 0xfffffe86e1a15930 error 2 > | fill_vmentry: vp 0xfffffe87047f8bd8 error 2 > | fill_vmentry: vp 0xfffffe87051af7e0 error 2 > | fill_vmentry: vp 0xfffffe86ef0b63f0 error 2 > > This is DIAGNOSTIC and it is tangentially related to your favorite > friend (F_GETPATH) :-) > > Let me explain what's wrong here. Getting from a file descriptor > to a vnode is always a success (if the file descriptor refers to one) > (vp is the pointer to a vnode here). > Getting from a vnode to a path is not (here you get 2 ENOENT from > vnode_to_path): > > 1. The file is removed so there is no path (what I suspect is happening here). > 2. There are more than one paths and it is not deterministic which one you get > (usually does not matter, but it does when you don't have permission to > get to the one returned but you have to the other) > 3. vnode_to_path() uses the reverse-namei cache to do its deed. This can > lose in 2 different ways: > - cache eviction: not really an issue unless there is memory pressure > (still need to handle it, but infrequent). > - path component length... The dreaded NCHNAMLEN (31) constant which > is the component namelength limit for the current namei cache > implementation (we should really fix that one day). > > This is why I keep saying forget adding F_GETPATH unless you can make it > work reliably first :-) > Thank you for the analysis. These reports aren't fatal to the stability of the system. Once I will sort out the noise from tests, I will have a closer look at this.
Attachment:
signature.asc
Description: OpenPGP digital signature