Beautiful Code [56]
To restate the ramifications of the turnstile table: threads blocked on different synchronization primitives can have their thread locks point to the same turnstile lock. Given that we must hold two locks at a time while traversing the blocking chain, this creates a nasty lock ordering problem. When Jeff encountered this problem in his original implementation, he solved it in an elegant way; his comment in turnstile_interlock() explains the problem and his solution:
Code View: Scroll / Show All
/*
* When we apply priority inheritance, we must grab the owner's thread lock
* while already holding the waiter's thread lock. If both thread locks are
* turnstile locks, this can lead to deadlock: while we hold L1 and try to
* grab L2, some unrelated thread may be applying priority inheritance to
* some other blocking chain, holding L2 and trying to grab L1. The most
* obvious solution -- do a lock_try( ) for the owner lock -- isn't quite
* sufficient because it can cause livelock: each thread may hold one lock,
* try to grab the other, fail, bail out, and try again, looping forever.
* To prevent livelock we must define a winner, i.e. define an arbitrary
* lock ordering on the turnstile locks. For simplicity we declare that
* virtual address order defines lock order, i.e. if L1 < L2, then the
* correct lock ordering is L1, L2. Thus the thread that holds L1 and
* wants L2 should spin until L2 is available, but the thread that holds
* L2 and can't get L1 on the first try must drop L2 and return failure.
* Moreover, the losing thread must not reacquire L2 until the winning
* thread has had a chance to grab it; to ensure this, the losing thread
* must grab L1 after dropping L2, thus spinning until the winner is done.
* Complicating matters further, note that the owner's thread lock pointer
* can change (i.e. be pointed at a different lock) while we're trying to
* grab it. If that happens, we must unwind our state and try again.
*/
This lock ordering issue is part of what made it difficult to implement priority inheritance for kernel synchronization objects—and unfortunately, kernel-level priority inheritance solves only part of the priority inversion problem.
Providing priority inheritance exclusively for kernel synchronization objects has an obvious shortcoming: to build a multithreaded real-time system, one needs priority inheritance not just for kernel-level synchronization primitives, but also for user-level synchronization primitives. And it was this problem—user-level priority inheritance—that we decided to address in Solaris 8. We assigned an engineer to solve it, and (with extensive guidance from those of us who best understand the guts of scheduling and synchronization), the new facility was integrated in October 1999.
A few months later—in December of 1999—I was looking at an operating system failure that a colleague had encountered. It was immediately clear that this was some sort of defect in our implementation of user-level priority inheritance, but as I understood the bug, I came to realize that this was no surface problem: this was a design defect—and I could practically smell our wine turning to sewage.
Before explaining this bug—and the design defect that it revealed—it's worth discussing the methodology used to debug it. An important skill for any software engineer is the ability to analyze the failure of a complicated software system, and to present that analysis rigorously. And in any sufficiently