Online Book Reader

Home Category

Beautiful Code [61]

By Root 5118 0
primitive, and then—in order to waive any inherited priority—acquiring the thread lock of the former holder of the lock (which is to say, the current thread).

Here's the problem: we are entering the function that waives inherited priority (the turnstile_pi_waive() function) from turnstile_block(), after we already appear to be blocked. In particular, the current thread's thread lock has already been changed to point not to the current CPU's lock, but to the lock for the entry in the turnstile table that corresponds to the user-level lock on which we are actually blocking. So, if the kernel-level lock and the user-level lock happen to hash to the same entry in the turnstile table (as they did in the failure in which we first saw this), the turnstile lock acquired in turnstile_lookup() and the thread lock acquired in turnstile_pi_waive() will be the same lock—and we will have single-thread deadlock. Even if these locks happen not to hash to the same entry in the turnstile table, but happen not to be in the lock ordering dictated by turnstile_ interlock(), we have the potential for a classic AB/BA deadlock. Sewage, either way.

When we understood the problem, it seemed intractable. Given that the fundamental problem was that we were dropping the in-kernel lock after we appeared to be blocked, the tempting course would have been to find a way to eliminate the kernel-level lock. But we knew from our work on the earlier bugs that this line of thought was a dead end; we understood that the in-kernel lock was required, and we knew that it couldn't be dropped until priority had been willed down the entire blocking chain.

This left us challenging more fundamental assumptions: could we somehow flip the order in turnstile_block() such that we willed priority before modifying the current thread's data structures to indicate that it's asleep? (No, it would introduce a window for priority inversion.) Could we somehow indicate that we are in this state such that the call to turnstile_ pi_waive() from turnstile_block() via mutex_vector_enter() didn't induce the deadlock? (No, as this didn't address the multithreaded deadlock scenario.)

Whenever we came up with a hypothetical solution, we were quick to see its fatal flaws—and the more we thought about the problem, the more we saw flaws instead of solutions.

Hopelessness was beginning to set in; it was very frustrating that merely adding the new dimension of user-level priority inheritance could invalidate what had seemed to be a perfect mechanism. The spoon had become the barrel, and we felt adrift in sewage.

As we got up to seek solace in a nearby coffee shop, an idea occurred to us: if user-level priority inheritance was the problem, perhaps we were being overly general in our thinking. Instead of solving this problem at its most abstract, why not deal specifically with this problem by, say, partitioning the turnstile table? We could hash the in-kernel locks protecting the user-level priority inheritance state to one half of the table, and hash every other lock to the other half.

This would guarantee us that the lock that we would be dropping immediately before calling swtch() in turnstile_block() would necessarily hash to a different entry in the turnstile table than the lock upon which we were blocking. Moreover, by guaranteeing that any kernel-level lock protecting the state of a user-level priority-inheriting lock hashed to a turn-stile table entry with a lower virtual address than any turnstile table entry for any other kind of lock, we would also be guaranteeing that the locking order dictated by turnstile_ interlock() would always be observed; we would be solving both the single-threaded and multithreaded cases.

On the one hand, this solution seemed like some pretty gross special-casing; it would mean putting knowledge of one specific kind of lock (the lock protecting in-kernel, user-level priority inheritance state) into the generic turnstile system. On the other hand, we were certain that it would work, and it would be a reasonably straightforward and low-risk change—which

Return Main Page Previous Page Next Page

®Online Book Reader