High Performance Computing - Charles Severance [86]
main() {
int i,retval;
pthread_t tid;
globvar = 0;
pthread_attr_init(&attr); /* Initialize attr with defaults */
pthread_attr_setscope(&attr, PTHREAD_SCOPE_SYSTEM);
printf("Main - globvar=%d\n",globvar);
for(i=0;i retval = pthread_create(&tid,&attr,SpinFunc,(void *) index[i]); printf("Main - creating i=%d tid=%d retval=%d\n",i,tid,retval); thread_id[i] = tid; } printf("Main thread - threads started globvar=%d\n",globvar); for(i=0;i retval = pthread_join( thread_id[i], NULL ) ; printf("Main - back from join %d retval=%d\n",i,retval); } printf("Main thread - threads completed globvar=%d\n",globvar); } The code executed by the master thread is modified slightly. We create an “attribute” data structure and set the PTHREAD_SCOPE_SYSTEM attribute to indicate that we would like our new threads to be created and scheduled by the operating system. We use the attribute information on the call to pthread_create( ). None of the other code has been changed. The following is the execution output of this new program: recs % create3 Main - globvar=0 Main - creating i=0 tid=4 retval=0 SpinFunc me=0 - sleeping 1 seconds ... Main - creating i=1 tid=5 retval=0 Main thread - threads started globvar=0 Main - waiting for join 4 SpinFunc me=1 - sleeping 2 seconds ... SpinFunc me=0 - wake globvar=0... SpinFunc me=0 - spinning globvar=1... SpinFunc me=1 - wake globvar=1... SpinFunc me=1 - spinning globvar=2... SpinFunc me=1 - done globvar=2... SpinFunc me=0 - done globvar=2... Main - back from join 0 retval=0 Main - waiting for join 5 Main - back from join 1 retval=0 Main thread - threads completed globvar=2 recs % Now the program executes properly. When the first thread starts spinning, the operating system is context switching between all three threads. As the threads come out of their sleep( ), they increment their shared variable, and when the final thread increments the shared variable, the other two threads instantly notice the new value (because of the cache coherency protocol) and finish the loop. If there are fewer than three CPUs, a thread may have to wait for a time-sharing context switch to occur before it notices the updated global variable. With operating-system threads and multiple processors, a program can realistically break up a large computation between several independent threads and compute the solution more quickly. Of course this presupposes that the computation could be done in parallel in the first place. Given that we have multithreaded capabilities and multiprocessors, we must still convince the threads to work together to accomplish some overall goal. Often we need some ways to coordinate and cooperate between the threads. There are several important techniques that are used while the program is running with multiple threads, including: Fork-join (or create-join) programming Synchronization using a critical section with a lock, semaphore, or mutex Barriers Each of these techniques has an overhead associated with it. Because these overheads are necessary to go parallel, we must make sure that we have sufficient work to make the benefit of parallel operation worth the cost. This approach is the simplest method of coordinating your threads. As in the earlier examples in this chapter, a master thread sets up some global data structures that describe the tasks each thread is to perform and then use the pthread_create( ) function to activate the proper number of threads. Each thread checks the global data structure using its thread-id as an index to find its task. The thread then performs the task and completes. The master thread waits at a pthread_join( ) point, and when a thread has completed, it updates the global data structure and creates a new thread. These steps are repeated for each major iteration (such as a time-step) for the duration of the program: for(ts=0;ts<10000;ts++)
Techniques for Multithreaded Programs*
Fork-Join Programming