Beautiful Code [31]
Matters become even more complicated when we want to block. For example, suppose that transfer should block if from has insufficient funds. This is usually done by waiting on a condition variable, while simultaneously releasing from's lock. It gets much trickier if we want to block until there are sufficient funds in from and from2 considered together.
24.1.2. Locks Are Bad
To make a long story short, today's dominant technology for concurrent programming—locks and condition variables—is fundamentally flawed. Here are some standard difficulties, some of which we have just seen:
Taking too few locks
It is easy to forget to take a lock and thereby end up with two threads that modify the same variable simultaneously.
Taking too many locks
It is easy to take too many locks and thereby inhibit concurrency (at best) or cause deadlock (at worst).
Taking the wrong locks
In lock-based programming, the connection between a lock and the data it protects often exists only in the mind of the programmer and is not explicit in the program. As a result, it is all too easy to take or hold the wrong locks.
Taking locks in the wrong order
In lock-based programming, one must be careful to take locks in the "right" order. Avoiding the deadlock that can otherwise occur is always tiresome and error-prone, and sometimes extremely difficult.
Error recovery
Error recovery can be very hard because the programmer must guarantee that no error can leave the system in a state that is inconsistent, or in which locks are held indefinitely.
Lost wakeups and erroneous retries
It is easy to forget to signal a condition variable on which a thread is waiting, or to retest a condition after a wakeup.
But the fundamental shortcoming of lock-based programming is that locks and condition variables do not support modular programming. By "modular programming," I mean the process of building large programs by gluing together smaller programs. Locks make this impossible. For example, we could not use our (correct) implementations of withdraw and deposit unchanged to implement transfer; instead, we had to expose the locking protocol. Blocking and choice are even less modular. For example, suppose we had a version of withdraw that blocked if the source account had insufficient funds. Then we would not be able to use withdraw directly to withdraw money from A or B (depending on which had sufficient funds), without exposing the blocking condition—and even then it wouldn't be easy. This critique is elaborated elsewhere.[§]
[§] Edward A. Lee, "The problem with threads,"IEEE Computer, Vol. 39, No. 5, pp. 33–42, May 2006; J. K. Ousterhout, "Why threads are a bad idea (for most purposes)," Invited Talk, USENIX Technical Conference, January 1996; Tim Harris, Simon Marlow, Simon Peyton Jones, and Maurice Herlihy, "Composable memory transactions," ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '05), June 2005.
The Quest for an Accelerated Population Count > Basic Methods
10. The Quest for an Accelerated Population Count
Henry S. Warren, Jr.
A fundamental computer algorithm, and a deceptively simple one, is the population count or sideways sum, which calculates the number of bits in a computer word that are 1. The population count function has applications that range from the very simple to the quite sublime. For example, if sets are represented by bit strings, population count gives the size of the set. It can also be used to generate binomially distributed random integers. These and other applications are discussed at the end of this chapter.
Although uses of this operation are not terribly common, many computers—often the supercomputers of their day—had an instruction for it. These included the Ferranti Mark I (1951), the IBM Stretch computer (1960), the CDC 6600 (1964), the Russian-built BESM-6 (1967), the Cray 1 (1976), the Sun SPARCv9 (1994), and the IBM Power 5 (2004).
This chapter discusses how to compute the population count