As for penalty, it should be very low. The expectation is that there
would be no contention for the lock, so no waiting for mutex_enter() to
finish. Just the cost of taking and releasing the mutex.
Actually, this can be a high penalty under contention when we could
have taken a no-lock fast path.
Specifically, any atomic operation, including mutex_enter, requires
the CPU to send a message to every other CPU on the system bus and
wait for all the answers to come back. mutex_enter further must wait
for any active critical section on another CPU to complete. This is
what a no-lock fast path is supposed to avoid.
Hence I suggest that mutex_ownable be limited to KDASSERT -- so that
you get it only if you combine DEBUG (expensive consistency checks)
and LOCKDEBUG (locking bug detection).