tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: using of fork() in multithreaded application



> Date: Fri, 24 Jan 2025 10:21:33 +0100
> From: Peter Skvarka <ps%softinengines.com@localhost>
> 
> Unfortunately I am not able to create minimal sample.
> It happens in wxWidgets library function wxExecute() used by Codeblocks 
> IDE and I can reproduce it only with started codeblocks process. The 
> code of wxExecute() is complicated and it is problem to extract only 
> essential parts and create minimal reproducer.

You could try ktracing the process to record its system calls:

1. Start up the Codeblocks IDE and find its pid, say 12345.
2. In a terminal, run `ktrace -p 12345'.
3. Trigger the hang.
4. In the terminal, run `ktrace -C' to stop tracing.
5. Run `kdump' to print the trace of syscalls.

> I have this additional info: Forked parent retrieves child output 
> through pipe and it waits for child finishing with select() infinitely 
> and child stays zombie - I  am seeing flag Z in ps -auxd list.
> I checked command line arguments passed to execv(), it is ok ("gcc 
> -dumpversion")

It's hard to tell from just this information, but one possibility is
that the parent and child are communicating through a pipe, and the
parent has kept the writing side of the pipe open after forking the
child.

Suppose the parent is waiting in select() for the reading side of the
pipe to be ready, and the child exits.  If the child had the last
descriptor for the writing side, then the parent would wake up -- but
if the parent also has a descriptor for the writing side, and the
parent isn't handling SIGCHLD, then select() might wait forever in a
deadlock.

> Child's C++ code from fork() to execv() does not uses pthread 
> synchronization objects, it only prepares pipes and calls execv().
> So it is question if reason is bad using of fork() in multithreaded 
> application, or for example bad usage of pipes or something other. I 
> think that it is NetBSD specific thing because of no similar report on 
> other os-es.

This is most likely an application error.  I wouldn't be surprised if
there are nefarious locks lurking underneath some innocent-looking C++
tokens.  And I wouldn't be surprised if there's a mistake in handling
file descriptors and child waits.  The application could be
accidentally relying on the way other operating systems implement some
kind of undefined behaviour it triggers.  But there's too little
information to say so far.

> Currently I have debug built and I am trying to retrieve more info why 
> forked child stays zombie.
> Do you think that is possible to diagnose or to debug with gdb phase of 
> changing state to zombie ?
> Or to look on some child's process structures
> What is real reason for zombie state ? Can it be holding of not closed 
> pipe or file ? Or it can stay zombie
> when it is terminated from another process ? Is possible to investigate 
> what resource is not freed by process which can be reason for entering 
> into zombie state ?

When a process terminates, it becomes a zombie process until the
parent calls one of the wait() family of system calls.  This is the
basic mechanism in Unix for managing processes; if you're not familiar
with the Unix process life cycle, you might want to find a tutorial on
Unix processes, like maybe the Stevens book (Advanced Programming in
the Unix Environment, 1992).


Home | Main Index | Thread Index | Old Index