mutex vs turnstile

Some time ago I wrote about performance problems when doing high -j
build.sh and made few remarks about mutex implementation.

TL;DR for that one was mostly that cas returns the found value, so
explicit re-reads can be avoided. There are also avoidable explicit
barriers (yes, I know about the old Opteron errata).

I had another look and have a remark definitely worth looking at (and
easy to fix). Unfortunately I lost my test jig so can't benchmark.

That said, here it is:

After it is is concluded that lock owner sleeps:

        ts = turnstile_lookup(mtx);
        /*
         * Once we have the turnstile chain interlock, mark the
         * mutex has having waiters. If that fails, spin again:
         * chances are that the mutex has been released.
         */
        if (!MUTEX_SET_WAITERS(mtx, owner)) {
                turnstile_exit(mtx);
                owner = mtx->mtx_owner;
                continue;
        }

Note that the lock apart from being free, can be:
1. owned by the same owner, which is now running

In this case the bit is set spuriously and triggers slow path
unlock.

2. owned by a different owner, which is NOT running

Assuming the owner still has the lock and is not running after
turnstile_exit, this causes extra trip.

That said, proposed change is trivial and follows what FreeBSD is doing:
re-read and abort sleep unless the lock is owned by an off-cpu thread.

Note there is a minor optimisation of not setting the bit if already
seen.

That said, the minimal fix would look like this (untested):
        ts = turnstile_lookup(mtx);
        owner = mtx->mtx_owner;
        if (mutex_oncpu(owner) || !MUTEX_SET_WAITERS(mtx, owner))
                turnstile_exit(mtx);
                owner = mtx->mtx_owner;
                continue;
        }

If value from cas was to be preserved it would definitely make sense
to re-test failed MUTEX_SET_WAITERS while still holding the turnstile lock.

It looks like rw locks can use a similar treatment.

Fun fact is that implementation on Illumos behaves worse in this regard:
it sets the waiters bit regardless *who* owns the lock (or whether they
are running), but then only goes to sleep if the *original* owner has
the lock.

--

Mateusz Guzik <mjguzik gmail.com>