Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [144]

By Root 972 0
generating algorithms and precision vary. Most algorithms for generation of such numbers step through a sequence from a finite set without repetition, and the sequence ultimately repeats itself after a number of steps called the period of the generator. Library documentation sometimes does not make clear whether the unit interval endpoints, 0.0 and 1.0, are included in the range of rand( ), or what the period is.

The ambiguity in the generator's result interval endpoints makes programming harder. Suppose that you want to generate pseudorandom integers between 0 and 100 inclusive. If you use the simple expression int(rand( )*100), you will not get the value 100 at all if rand( ) never returns 1.0, and even if it does, you will get 100 much less frequently than any other integer between 0 and 100, since it is produced only once in the generator period, when the generator returns the exact value 1.0. Fudging by changing the multiplier from 100 to 101 does not work either because you might get an out-of-range result of 101 on some systems.

The irand( ) function in Example 9-9 provides a better solution to the problem of generating pseudorandom integers. irand( ) forces integer endpoints and then, if the requested range is empty or invalid, returns one endpoint. Otherwise, irand( ) samples an integer that might be one larger than the interval width, adds it to low, and then retries if the result is out of range. Now it does not matter whether rand( ) ever returns 1.0, and the return values from irand( ) are as uniformly distributed as the rand( ) values.

Example 9-9. Generating pseudorandom integers

function irand(low, high, n)

{

# Return a pseudorandom integer n such that low <= n <= high

# Ensure integer endpoints

low = int(low)

high = int(high)

# Sanity check on argument order

if (low >= high)

return (low)

# Find a value in the required range

do

n = low + int(rand( ) * (high + 1 - low))

while ((n < low) || (high < n))

return (n)

}

In the absence of a call to srand( x ), gawk and nawk use the same initial seed on each run so that runs are reproducible; mawk does not. Seeding with the current time via a call to srand( ) to get different sequences on each run is reasonable, if the clock is precise enough. Unfortunately, although machine speeds have increased dramatically, most time-of-day clocks used in current awk implementations still tick only once per second, so it is quite possible that successive runs of a simulation execute within the same clock tick. The solution is to avoid calling srand( ) more than once per run, and to introduce a delay of at least one second between runs:

$ for k in 1 2 3 4 5

> do

> awk 'BEGIN {

> srand( )

> for (k = 1; k <= 5; k++)

> printf("%.5f ", rand( ))

> print ""

> }'

> sleep 1

> done

0.29994 0.00751 0.57271 0.26084 0.76031

0.81381 0.52809 0.57656 0.12040 0.60115

0.32768 0.04868 0.58040 0.98001 0.44200

0.84155 0.56929 0.58422 0.83956 0.28288

0.35539 0.08985 0.58806 0.69915 0.12372

Without the sleep 1 statement, the output lines are often identical.

Summary

A surprisingly large number of text processing jobs can be handled with the subset of awk that we have presented in this chapter. Once you understand awk's command line, and how it automatically handles input files, the programming job reduces to specifying record selections and their corresponding actions. This kind of minimalist data-driven programming can be extremely productive. By contrast, most conventional programming languages would burden you with dozens of lines of fairly routine code to loop over a list of input files, and for each file, open the file, read, select, and process records until end-of-file, and finally, close the file.

When you see how simple it is to process records and fields with awk, your view of data processing can change dramatically. You begin to divide large tasks into smaller, and more manageable, ones. For example, if you are faced with processing complex binary files, such as those used for databases, fonts, graphics, slide makers, spreadsheets,

Return Main Page Previous Page Next Page

®Online Book Reader