tech-userlevel archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

[PATCH] pthread_once and fork



tl;dr -- pthread_once can't satisfy its contract to call a function
_once_ if concurrent fork is allowed, even though that would be
useful, e.g. for lazy initialization of arc4random state.

Proposal:

   Make pthread_once mutually exclusive with fork -- mainly at the
   cost of one leaf function call and two predicted-not-taken
   conditional branches in fork, plus a little more private coupling
   between libc and libpthread.

Thoughts?


Full story:

pthread_once is supposed to enable lazy initialization of a library by
guaranteeing that the init routine has been called once.

What happens if another thread concurrently forks and then tries to
use the library in the child?

POSIX doesn't guarantee this is allowed: pthread_once is not
async-signal-safe, and POSIX only endorses calling async-signal-safe
functions followed by exec.

But that would preclude library routines like arc4random from
initializing state, like a pthread_atfork handler, that would allow
them to safely and efficiently work in the child.  So I think it would
be useful to provide stronger guarantees than POSIX provides (at least
internally in libc if not for applications).

Currently, what may happen is that pthread_once hangs in the child,
because it looks like the thread that ran pthread_once still holds the
internal mutex in the child -- even though that thread doesn't
actually exist in the child.

I drafted a hack with a futex and a fork generation number (which
rolls around after 2^31 nested forks with no intervening execs -- not
perfect but not likely to matter either) so that the child can detect
when the pthread_once call was in progress concurrently and roll it
back.  (It turns out this is what glibc does.)

But with this approach, the init_routine may appear to have run _more
than once_ in the child, which violates the contract of pthread_once
to run it at most once (setting aside what happens if the thread is
cancelled during init_routine, which is precisely specified in POSIX).

For example, with

int x;
void init_routine(void *cookie) { x++; }

the child may observe x=1 or x=2.

Worse, with

int x, y;
void init_routine(void *c) { x++; y++; }

if the concurrent fork happens between x++ and y++, what can the child
do?  If it just stops there, init_routine will not have been called
and returned once and we end with x=1, y=0; if it restarts from the
top, init_routine will have been called and returned somewhere between
once and twice(!), with x=2, y=1.  But surely the contract of
pthread_once is that x=1, y=1 is the only allowed outcome.

I think the only defensible answer is that fork and pthread_once have
to be mutually exclusive, if pthread_once is to work at all in the
child.  The attached patch implements this: fork holds up any
concurrent calls to pthread_once, and pthread_once holds up any
concurrent calls to fork (but not other calls to pthread_once).  And
if you call fork _inside_ a pthread_once init routine, it fails with
EDEADLK.

The mechanism is essentially a reader/writer lock around fork, with
fork as the `writer' and pthread_once as the `reader' -- only one
writer fork at a time, any number of pthread_once readers at a time.
(This is currently a naive reader/writer lock, with potential writer
starvation and no priority inheritance -- probably not a big deal but
not hard in principle to fix.)

Additionally, pthread_t tracks the pthread_once call depth so fork can
tell whether to fail with EDEADLK.

libc provides the following private functions, to be used only by
libpthread:

void __libc_block_fork() -- enters a reader, used by pthread_once
void __libc_unblock_fork() -- exits a reader, used by pthread_once

libpthread provides the following private function (with a thread stub
in libc for single-threaded applications), to be used only by libc:

bool __libc_thr_in_once() -- true if executing pthread_once, false if
  not, used by fork to decide whether to fail with EDEADLK or not


References:

- POSIX.1-2024 fork
  https://pubs.opengroup.org/onlinepubs/9799919799/functions/fork.html

- POSIX.1-2024 pthread_once
  https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_once.html

- PR lib/59124: arc4random(3): first call in process races with concurrent fork
  <https://gnats.NetBSD.org/59124>

- PR lib/59125: pthread_once(3) races with concurrent fork
  <https://gnats.NetBSD.org/59125>
# HG changeset patch
# User Taylor R Campbell <riastradh%NetBSD.org@localhost>
# Date 1741201110 0
#      Wed Mar 05 18:58:30 2025 +0000
# Branch trunk
# Node ID 7a2f805fa998b423cdc8ebbeaf70b97abc2dc8fb
# Parent  abced29d5da6d71badf2de46a8b1e589d4703b90
# EXP-Topic riastradh-pr59125-oncefork
pthread_once(3): Fix races with concurrent fork by blocking it.

If fork can copy any states that the pthread_once initialization
routine transitions through, then it is not possible to satisfy the
contract of pthread_once(3), which is that the initialization routine
has been called exactly once.

To avoid this, we hold up fork until any concurrent pthread_once(3)
calls that may do any initialization have completed.

It is also not possible for pthread_once(3) to satisfy its contract
if the initalization routine itself forks.  To prevent this, we make
fork(3) fail with EDEADLK if you try to call it during that time.
Nested calls to pthread_once(3) count the call depth.  (Without this
explicit detection, fork(3) would simply deadlock instead.)

The runtime cost to fork(3) is two predicted-not-taken branches, one
via indirect function call -- and in the unlikely event that there
are any concurrent pthread_once(3) calls, it waits for them to
complete.

The runtime cost to the first call to pthread_once(3) is potential
contention over the atfork_lock, which serializes fork(3) and
pthread_atfork(3).  However, the initialization routines themselves
in concurrent calls to pthread_once(3) are not serialized; they can
run in parallel.

While here, protect pthread_once(3) from PTHREAD_CANCEL_ASYNCHRONOUS.

PR lib/59125: pthread_once(3) races with concurrent fork

diff -r abced29d5da6 -r 7a2f805fa998 lib/libc/gen/pthread_atfork.c
--- a/lib/libc/gen/pthread_atfork.c	Fri Apr 11 19:15:42 2025 +0000
+++ b/lib/libc/gen/pthread_atfork.c	Wed Mar 05 18:58:30 2025 +0000
@@ -46,6 +46,7 @@
 #include <sys/sysctl.h>
 
 #include "extern.h"
+#include "fork.h"
 #include "reentrant.h"
 
 #ifdef __weak_alias
@@ -233,6 +234,28 @@ pthread_atfork(void (*prepare)(void), vo
 	return error;
 }
 
+static uint64_t block_fork_count;
+static cond_t block_fork_cv = COND_INITIALIZER;
+
+void
+__libc_block_fork(void)
+{
+
+	mutex_lock(&atfork_lock);
+	block_fork_count++;	/* 64-bit, can't overflow in 500 years */
+	mutex_unlock(&atfork_lock);
+}
+
+void
+__libc_unblock_fork(void)
+{
+
+	mutex_lock(&atfork_lock);
+	if (--block_fork_count == 0)
+		cond_broadcast(&block_fork_cv);
+	mutex_unlock(&atfork_lock);
+}
+
 pid_t
 fork(void)
 {
@@ -240,6 +263,14 @@ fork(void)
 	pid_t ret;
 
 	mutex_lock(&atfork_lock);
+	if (__predict_false(__libc_thr_in_once())) {
+		mutex_unlock(&atfork_lock);
+		errno = EDEADLK;
+		return -1;
+	}
+	while (__predict_false(block_fork_count))
+		cond_wait(&block_fork_cv, &atfork_lock);
+
 	SIMPLEQ_FOREACH(iter, &prepareq, next)
 		(*iter->fn)();
 	_malloc_prefork();
diff -r abced29d5da6 -r 7a2f805fa998 lib/libc/include/fork.h
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/lib/libc/include/fork.h	Wed Mar 05 18:58:30 2025 +0000
@@ -0,0 +1,36 @@
+/*	$NetBSD$	*/
+
+/*-
+ * Copyright (c) 2025 The NetBSD Foundation, Inc.
+ * All rights reserved.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
+ * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
+ * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+ * PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS
+ * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef	_LIB_LIBC_FORK_H_
+#define	_LIB_LIBC_FORK_H_
+
+void __libc_block_fork(void);
+void __libc_unblock_fork(void);
+int __libc_thr_in_once(void);
+
+#endif	/* _LIB_LIBC_FORK_H_ */
diff -r abced29d5da6 -r 7a2f805fa998 lib/libc/include/reentrant.h
--- a/lib/libc/include/reentrant.h	Fri Apr 11 19:15:42 2025 +0000
+++ b/lib/libc/include/reentrant.h	Wed Mar 05 18:58:30 2025 +0000
@@ -262,6 +262,7 @@ void	*__libc_thr_getspecific_stub(thread
 int	__libc_thr_keydelete_stub(thread_key_t);
 
 int	__libc_thr_once_stub(once_t *, void (*)(void));
+int	__libc_thr_in_once_stub(void);
 int	__libc_thr_sigsetmask_stub(int, const sigset_t *, sigset_t *);
 thr_t	__libc_thr_self_stub(void);
 int	__libc_thr_yield_stub(void);
diff -r abced29d5da6 -r 7a2f805fa998 lib/libc/thread-stub/thread-stub.c
--- a/lib/libc/thread-stub/thread-stub.c	Fri Apr 11 19:15:42 2025 +0000
+++ b/lib/libc/thread-stub/thread-stub.c	Wed Mar 05 18:58:30 2025 +0000
@@ -355,6 +355,7 @@ int
 /* misc. */
 
 __weak_alias(__libc_thr_once,__libc_thr_once_stub)
+__weak_alias(__libc_thr_in_once,__libc_thr_in_once_stub)
 __weak_alias(__libc_thr_sigsetmask,__libc_thr_sigsetmask_stub)
 __weak_alias(__libc_thr_self,__libc_thr_self_stub)
 __weak_alias(__libc_thr_yield,__libc_thr_yield_stub)
@@ -365,6 +366,8 @@ int
 __weak_alias(__libc_thr_curcpu,__libc_thr_curcpu_stub)
 
 
+static size_t __libc_thr_once_depth;
+
 int
 __libc_thr_once_stub(once_t *o, void (*r)(void))
 {
@@ -372,7 +375,9 @@ int
 	/* XXX Knowledge of libpthread types. */
 
 	if (o->pto_done == 0) {
+		__libc_thr_once_depth++;
 		(*r)();
+		__libc_thr_once_depth--;
 		o->pto_done = 1;
 	}
 
@@ -380,6 +385,13 @@ int
 }
 
 int
+__libc_thr_in_once_stub(void)
+{
+
+	return __libc_thr_once_depth != 0;
+}
+
+int
 __libc_thr_sigsetmask_stub(int h, const sigset_t *s, sigset_t *o)
 {
 
diff -r abced29d5da6 -r 7a2f805fa998 lib/libpthread/pthread_int.h
--- a/lib/libpthread/pthread_int.h	Fri Apr 11 19:15:42 2025 +0000
+++ b/lib/libpthread/pthread_int.h	Wed Mar 05 18:58:30 2025 +0000
@@ -106,6 +106,7 @@ struct	__pthread_st {
 	struct pthread_lock_ops pt_lockops;/* Cached to avoid PIC overhead */
 	void		*(*pt_func)(void *);/* Function to call at start. */
 	void		*pt_arg;	/* Argument to pass at start. */
+	size_t		pt_oncedepth;	/* pthread_once call depth */
 
 	/* Stack of cancellation cleanup handlers and their arguments */
 	PTQ_HEAD(, pt_clean_t)	pt_cleanup_stack;
diff -r abced29d5da6 -r 7a2f805fa998 lib/libpthread/pthread_mi.expsym
--- a/lib/libpthread/pthread_mi.expsym	Fri Apr 11 19:15:42 2025 +0000
+++ b/lib/libpthread/pthread_mi.expsym	Wed Mar 05 18:58:30 2025 +0000
@@ -29,6 +29,7 @@
 __libc_thr_errno
 __libc_thr_exit
 __libc_thr_getspecific
+__libc_thr_in_once
 __libc_thr_init
 __libc_thr_keycreate
 __libc_thr_keydelete
diff -r abced29d5da6 -r 7a2f805fa998 lib/libpthread/pthread_once.c
--- a/lib/libpthread/pthread_once.c	Fri Apr 11 19:15:42 2025 +0000
+++ b/lib/libpthread/pthread_once.c	Wed Mar 05 18:58:30 2025 +0000
@@ -1,12 +1,9 @@
-/*	$NetBSD: pthread_once.c,v 1.5 2025/03/04 00:41:00 riastradh Exp $	*/
+/*	$NetBSD$	*/
 
 /*-
- * Copyright (c) 2001, 2003 The NetBSD Foundation, Inc.
+ * Copyright (c) 2025 The NetBSD Foundation, Inc.
  * All rights reserved.
  *
- * This code is derived from software contributed to The NetBSD Foundation
- * by Nathan J. Williams, and by Jason R. Thorpe.
- *
  * Redistribution and use in source and binary forms, with or without
  * modification, are permitted provided that the following conditions
  * are met:
@@ -15,13 +12,6 @@
  * 2. Redistributions in binary form must reproduce the above copyright
  *    notice, this list of conditions and the following disclaimer in the
  *    documentation and/or other materials provided with the distribution.
- * 3. All advertising materials mentioning features or use of this software
- *    must display the following acknowledgement:
- *        This product includes software developed by the NetBSD
- *        Foundation, Inc. and its contributors.
- * 4. Neither the name of The NetBSD Foundation nor the names of its
- *    contributors may be used to endorse or promote products derived
- *    from this software without specific prior written permission.
  *
  * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS
  * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
@@ -37,42 +27,123 @@
  */
 
 #include <sys/cdefs.h>
-__RCSID("$NetBSD: pthread_once.c,v 1.5 2025/03/04 00:41:00 riastradh Exp $");
+__RCSID("$NetBSD$");
 
 /* Need to use libc-private names for atomic operations. */
 #include "../../common/lib/libc/atomic/atomic_op_namespace.h"
 
+#include "../../lib/libc/include/fork.h"
+
+#include <stdatomic.h>
+
 #include "pthread.h"
 #include "pthread_int.h"
 #include "reentrant.h"
 
+#define	atomic_load_acquire(p)						      \
+	atomic_load_explicit(p, memory_order_acquire)
+
+#define	atomic_store_release(p, v)					      \
+	atomic_store_explicit(p, v, memory_order_release)
+
 static void
-once_cleanup(void *closure)
+once_cleanup(void *cookie)
 {
+	pthread_once_t *once_control = cookie;
 
-       pthread_mutex_unlock((pthread_mutex_t *)closure);
+       pthread_mutex_unlock(&once_control->pto_mutex);
+       __libc_unblock_fork();
 }
 
 int
-pthread_once(pthread_once_t *once_control, void (*routine)(void))
+pthread_once(pthread_once_t *once_control, void (*init_routine)(void))
 {
+	int cancelstate;
+
 	if (__predict_false(__uselibcstub))
-		return __libc_thr_once_stub(once_control, routine);
+		return __libc_thr_once_stub(once_control, init_routine);
+
+	/*
+	 * Optimistically check whether it's already done, with a
+	 * load-acquire to synchronize with the store-release in
+	 * whatever thread did the initialization.
+	 */
+	if (__predict_true(atomic_load_acquire(&once_control->pto_done) == 1))
+		return 0;
+
+	/*
+	 * Carefully prepare to check and run the routine:
+	 *
+	 * 1. Defer cancellation while we synchronize in case the
+	 *    caller is configured as PTHREAD_CANCEL_ASYNCHRONOUS, so
+	 *    we are guaranteed to have a chance to clean up after
+	 *    ourselves if cancelled.
+	 *
+	 * 2. Block concurrent fork while we work so that children will
+	 *    never observe either the once_control state or the state
+	 *    that the user's init_routine runs partially initialized.
+	 *
+	 * 3. Serialize access to this once_control in particular by
+	 *    taking the mutex.
+	 *
+	 * 4. Push a cleanup action to unwind this preparation when
+	 *    we're done, or if we're cancelled during init_routine.
+	 */
+	pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, &cancelstate);
+	__libc_block_fork();
+	pthread_mutex_lock(&once_control->pto_mutex);
+	pthread_cleanup_push(&once_cleanup, &once_control->pto_mutex);
 
-	if (once_control->pto_done == 0) {
-		pthread_mutex_lock(&once_control->pto_mutex);
-		pthread_cleanup_push(&once_cleanup, &once_control->pto_mutex);
-		if (once_control->pto_done == 0) {
-			routine();
-			membar_release();
-			once_control->pto_done = 1;
-		}
-		pthread_cleanup_pop(1);
-	} else {
-		membar_acquire();
-	}
+	/*
+	 * It's possible that we raced with another caller to
+	 * pthread_once with the same state.  Don't run the routine
+	 * again if so.
+	 */
+	if (__predict_false(once_control->pto_done))
+		goto out;
+
+	/*
+	 * Allow cancellation, if the caller allowed it, while we run
+	 * the init routine.  If cancelled during the initialization,
+	 * we don't mark it done:
+	 *
+	 *	The pthread_once() function is not a cancellation
+	 *	point.  However, if init_routine is a cancellation
+	 *	point and is canceled, the effect on once_control shall
+	 *	be as if pthread_once() was never called.
+	 *
+	 * And mark ourselves in the middle of pthread_once so that if
+	 * init_routine tries to use fork(), it will fail -- it is
+	 * impossible to satisfy the requirements of pthread_once and
+	 * fork.
+	 */
+	pthread_setcancelstate(cancelstate, NULL);
+	pthread_self()->pt_oncedepth++; /* stack overflows before this can */
+	(*init_routine)();
+	pthread_self()->pt_oncedepth--;
+	pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL);
+
+	/*
+	 * Done.  Use a store-release to synchronize with the
+	 * load-acquire in unlocked users of pthread_once.
+	 */
+	atomic_store_release(&once_control->pto_done, 1);
+out:
+	/*
+	 * Release the lock, allow fork again, and restore the caller's
+	 * cancellation state.
+	 */
+	pthread_cleanup_pop(/*execute*/1);
+	pthread_setcancelstate(cancelstate, NULL);
 
 	return 0;
 }
 
+int
+__libc_thr_in_once(void)
+{
+
+	return __predict_false(pthread_self()->pt_oncedepth != 0);
+}
+
 __strong_alias(__libc_thr_once,pthread_once)


Home | Main Index | Thread Index | Old Index