Beautiful Code [59]
While I had some ideas on how to fix this, the late date in the release and the seriousness of the problem prompted me to call Jeff at home to discuss. As Jeff and I discussed the problem, we couldn't seem to come up with a potential solution that didn't introduce a new problem. Indeed, the more we talked about the problem, the harder it seemed—and we realized that we had erred originally, both by underestimating the problem and by delegating its solution.
Worse, Jeff and I began to realize that there must be another manifestation lurking. We knew that if one were blocking on the in-kernel lock when the false deadlock was discovered, the kernel would explicitly panic. But what if one were blocking on the user-level lock when the false deadlock was discovered? We quickly determined (and a test case con-firmed) that in this case, the attempt to acquire the user-level lock would (erroneously) return EDEADLK. That is, the kernel would see that the "deadlock" was induced by a user-level synchronization primitive, and therefore assume that it was an application-induced deadlock—a bug in the application.
So in this failure mode, a correct program would have one of its calls to pthread_mutex_lock erroneously fail—a failure mode even more serious than a panic, because any application that didn't check the return value of pthread_mutex_lock (as one well might not) could easily corrupt its own data by assuming that it owned a lock that, in fact, it had failed to acquire.
This problem, if encountered in the wild, would be virtually undebuggable—it absolutely had to be fixed.
So, how to solve these problems? We found this to be a hard problem because we kept trying to find a way to avoid that in-kernel lock. I have presented the in-kernel lock as a natural constraint on the problem, but that was a conclusion that we came to only with tremendous reluctance. Whenever one of us came up with some scheme to avoid the lock, the other would find some window invalidating the scheme.
After exhausting ourselves on the alternatives, we were forced to the conclusion that an in-kernel lock was a constraint on the user-level priority inheritance problem—and our focus switched from avoiding the situation to detecting it.
There are two cases to detect: the panic case and the false deadlock case. The false dead-lock case is actually pretty easy to detect and handle, because we always find ourselves at the end of the blocking chain—and we always find that the lock that we own that induced the deadlock is the in-kernel lock passed as a parameter to turnstile_block. Because we know that we have willed our priority to the entire blocking chain, we can just detect this and break out—and that is exactly what that cryptic comment that I added to turnstile_ block described, and what those two lines effected (the in-kernel lock that is passed to turnstile_block is stored in the local variable mp).
The panic case is nastier to deal with. As a reminder, in this case the thread owns the user-level synchronization object and is blocking trying to acquire the in-kernel lock. We might wish to handle this case in a similar way, by reasoning as follows: if the deadlock ends in the current thread, and the last thread in the blocking chain is blocked on a user-level synchronization object, the deadlock is false. (That is, we might wish to handle this case by a more general handling of the above case.) This is simple, but it's also wrong: it ignores the possibility of an actual application-level deadlock (i.e., an application bug), in which EDEADLK must be returned;