tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: How to wait on cv_timedwait() correctly



> Date: Thu, 30 Jan 2025 15:37:37 +0100
> From: Stephan <stephanwib%googlemail.com@localhost>
> 
> I am implementing a semaphore system for applications. To wait on a
> semaphore to become ready, it does basically this:
> 
> do
>         {
>             khs->khs_waiters++;
>             error = cv_timedwait_sig(&khs->khs_cv,
> &khs->khs_interlock, mstohz((flags & SEM_RELATIVE_TIMEOUT) ? timeout :
> 0));
>             khs->khs_waiters--;
> 
>             [...]
> 
>         } while (khs->khs_count - count < 0);

I suggest, for now, that you structure it like this:

	const unsigned deadline = getticks() + mstohz(...);
	unsgined delta;

	for (;;) {
		if (<condition>) {
			error = 0;
			break;
		}
		if (error)	/* if previous cv_timedwait_sig failed */
			break;
		if ((delta = deadline - getticks()) > INT_MAX) {
			error = EWOULDBLOCK;
			break;
		}
		error = cv_timedwait_sig(cv, lock, delta);
	}

This structure might look a little weird, because it's not a great
API, but we haven't made anything better that resolves all the
relevant issues.  In particular, this approach has the following
critical property just like a naive cv_wait loop with no error
conditions:

	If <condition> becomes true, then we will notice that and
	never accidentally exit the loop for any other reason.

This is important if you are doing something like this:

	mutex_enter();
	register_current_thread_as_recipient_of_next_resource();
	... cv_timedwait loop ...
	if (error)
		goto out;
	accept_next_resource();
out:	deregister_current_thread_as_recipient_of_next_resource();
	mutex_exit();
	return error;

If another thread has assigned ownership a resource to this thread, it
is critical for this thread to accept ownership (and later free the
resource when done), even if it wakes up because of a signal or a
timeout.

We have had driver bugs in this class where a thread issues a command
and waits for it to complete, the command completes at the same time
as a signal or timeout, and the thread -- thinking the command has
failed -- aborts the command (possibly some unrelated command that was
issued after this thread's command!) and ignores the result it never
realized it received.

I would even suggest ignoring the return value of cv_timedwait_sig
altogether and testing for signals explicitly -- except that you need
the return value in order to correctly distinguish ERESTART from EINTR
without a bunch of other work.

By the way: Does khs->khs_waiters exist only as a micro-optimization
for the wakeup path?

	if (__predict_false(khs->khs_waiters))
		cv_signal/broadcast(cv);

If so, you can ditch it -- cv_signal and cv_broadcast already have
that micro-optimization internally.

> I have seen that cv_signal() on a CV unblocks all threads waiting on
> it, instead of just one. That this could happen is briefly mentioned
> in condvar(9).

You have this backwards: cv_signal generally wakes a single thread
waiting on a condvar, while cv_broadcast always wakes all threads.
(cv_signal is for when you're trying to hand a resource off to a
single thread at a time, to avoid a thundering herd.)

However, any thread sleeping in cv_*wait* may spuriously wake up at
any time, so the callers have to be prepared for this.  This is
independent of whether you use cv_signal or cv_broadcast -- you always
have to be prepared for spurious wakeup.

> I think the problem is that when cv_timedwait_sig() is called
> repeatedly, the original timeout is used again and again, adding up to
> an unpredictable amount of time.

That's right.  Each call to cv_timedwait_sig sleeps for at most the
time you pass it (plus scheduling slop), but multiple calls to
cv_timedwait_sig in a loop are not coordinated.

> If that is the case, how can it be solved best? Is
> cv_timedwaitbt_sig() suitable? Or should one track sleep and wakeup
> times and calculate a new timeout each time?

I suggest for now that you calculate an absolute deadline in ticks,
using getticks() + mstohz(...), and write your loop based on when
getticks() has passed that deadline.  (And just use cv_wait_sig
instead for an indefinite timeout.)

I added cv_timedwaitbt_sig some years ago in an attempt to address
this and other problems, but it turns out that it doesn't work very
well (fails to address the key issue above about accepting transfer of
resource ownership), and I intend to revisit it at some point with a
different approach.

I briefly committed another approach I called `timedwaitclock', but it
wasn't fully baked either and the name was atrocious so I backed it
out:

https://mail-index.netbsd.org/source-changes/2020/05/03/msg116972.html

After that, I drafted another approach that I think will work out
better (and has a less awful name!) but I didn't finish it -- the
patch series is still languishing in my hg tree.  The loop would look
something like this instead:

	struct clockwait cw;

	error = clockwait_init_ticks/bt/timespec/...(&cw, timeout,
	    clock_id /* CLOCK_REALTIME, CLOCK_MONOTONIC, &c. */,
	    flags /* relative, absolute, &c. */);
	if (error)
		goto out;

	while (!<condition>) {
		error = cv_clockwait_sig(cv, lock, &cw);
		if (error)
			goto out;
	}

Internally, cv_clockwait_sig would:

(a) test for timeout or signals and fail _without releasing the lock_
    (so the condition can't have changed if it fails),

(b) wait interruptibly with timeout (and return zero in those cases,
    to be picked up at step (a) in the next iteration), and

(c) update cw with the time slept.

But this API is not ready yet.

Sorry this is such a mess -- I meant to clean it up a while ago but I
blew my round tuit supply on other problems.


Home | Main Index | Thread Index | Old Index