Beautiful Code [60]
To deal with this case, we observe that if a blocking chain runs from threads blocked on in-kernel synchronization objects to threads blocked on user-level synchronization objects, we know that we're in this case and only this case.[||] Because we know that we've caught another thread in code in which it can't be preempted (because we know that the other thread must be in the midst of turnstile_block, which explicitly disables preemption), we can fix this by busy-waiting until the lock changes, and then restarting the priority inheritance dance.
[||] Presumably like most other operating systems, Solaris never executes user-level code with kernel-level locks held—and never acquires user-level locks from in-kernel subsystems. This case is thus the only one in which we acquire a user-level lock with a kernel-level lock held.
Here's the code to handle this case:[#]
[#] The code dealing with turnstile_loser_lock didn't actually exist when we wrote this case; that was added to deal with (yet) another problem we discovered as a result of our four-day mind-meld. This problem deserves its own chapter, if only for the great name that Jeff gave it: "dueling losers." Shortly after Jeff postulated its existence, I actually saw a variant of this in the wild—a variant that I dubbed "cascading losers." But the losers—both dueling and cascading—will have to wait for another day.
Code View: Scroll / Show All
/*
* We now have the owner's thread lock. If we are traversing
* from non-SOBJ_USER_PI ops to SOBJ_USER_PI ops, then we know
* that we have caught the thread while in the TS_SLEEP state,
* but holding mp. We know that this situation is transient
* (mp will be dropped before the holder actually sleeps on
* the SOBJ_USER_PI sobj), so we will spin waiting for mp to
* be dropped. Then, as in the turnstile_interlock( ) failure
* case, we will restart the priority inheritance dance.
*/
if (SOBJ_TYPE(t->t_sobj_ops) != SOBJ_USER_PI &&
owner->t_sobj_ops != NULL &&
SOBJ_TYPE(owner->t_sobj_ops) == SOBJ_USER_PI) {
kmutex_t *upi_lock = (kmutex_t *)t->t_wchan;
ASSERT(IS_UPI(upi_lock));
ASSERT(SOBJ_TYPE(t->t_sobj_ops) == SOBJ_MUTEX);
if (t->t_lockp != owner->t_lockp)
thread_unlock_high(owner);
thread_unlock_high(t);
if (loser)
lock_clear(&turnstile_loser_lock);
while (mutex_owner(upi_lock) == owner) {
SMT_PAUSE( );
continue;
}
if (loser)
lock_set(&turnstile_loser_lock);
t = curthread;
thread_lock_high(t);
continue;
}
Once these problems were fixed, we thought we were done. But further stress testing revealed that an even darker problem lurked—one that I honestly wasn't sure that we would be able to solve.
This time, the symptoms were different: instead of an explicit panic or an incorrect error value, the operating system simply hung—hard. Taking (and examining) a dump of the system revealed that a thread had deadlocked attempting to acquire a thread lock from turnstile_block( ), which had been called recursively from turnstile_block( ) via mutex_vector_exit( ), the function that releases a mutex if it is found to have waiters. Given just this state, the problem was clear—and it felt like a punch in the gut.
Recall that the diabolical (but regrettably required) in-kernel lock needs to be acquired and dropped to either acquire or drop a user-level priority-inheriting lock. When blocking on the user-level lock, the kernel-level lock must be dropped after the thread has willed its priority, as essentially the last thing it does before it actually gives up the CPU via swtch(). (This was the code quoted in part in my original analysis; the code marked (2) in that analysis is the dropping of the kernel-level lock.)
But if another thread blocks on the kernel-level lock while we are dealing with the mechanics of blocking on the user-level lock, we will need to wake that waiter as part of dropping the kernel-level lock. Waking the waiter requires taking the thread lock in the turnstile table associated with the synchronization