Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [66]

By Root 907 0
a tab-separated list of

#

# count word tag filename

#

# sorted by ascending word and tag.

#

# Usage:

# taglist xml-file

cat "$1" |

sed -e 's#systemitem *role="url"#URL#g' -e 's#/systemitem#/URL#' |

tr ' ( ){ }[ ]' '\n\n\n\n\n\n\n' |

egrep '>[^<>]+awk -F'[<>]' -v FILE="$1" \

'{ printf("%-31s\t%-15s\t%s\n", $3, $2, FILE) }' |

sort |

uniq -c |

sort -k2,2 -k3,3 |

awk '{

print ($2 = = Last) ? ($0 " <----") : $0

Last = $2

}'

In Section 6.5, we will show how to apply the tag-list operation to multiple files.

Summary

This chapter has shown how to solve several text processing problems, none of which would be simple to do in most programming languages. The critical lessons of this chapter are:

Data markup is extremely valuable, although it need not be complex. A unique single character, such as a tab, colon, or comma, often suffices.

Pipelines of simple Unix tools and short, often inline, programs in a suitable text processing language, such as awk, can exploit data markup to pass multiple pieces of data through a series of processing stages, emerging with a useful report.

By keeping the data markup simple, the output of our tools can readily become input to new tools, as shown by our little analysis of the output of the word-frequency filter, wf, applied to Shakespeare's texts.

By preserving some minimal markup in the output, we can later come back and massage that data further, as we did to turn a simple ASCII office directory into a web page. Indeed, it is wise never to consider any form of electronic data as final: there is a growing demand in some quarters for page-description languages, such as PCL, PDF, and PostScript, to preserve the original markup that led to the page formatting. Word processor documents currently are almost devoid of useful logical markup, but that may change in the future. At the time of this writing, one prominent word processor vendor was reported to be considering an XML representation for document storage. The GNU Project's gnumeric spreadsheet, the Linux Documentation Project,[11] and the OpenOffice.org[12] office suite already do that.

Lines with delimiter-separated fields are a convenient format for exchanging data with more complex software, such as spreadsheets and databases. Although such systems usually offer some sort of report-generation feature, it is often easier to extract the data as a stream of lines of fields, and then to apply filters written in suitable programming languages to manipulate the data further. For example, catalog and directory publishing are often best done this way.

* * *

[11] See http://www.tldp.org/.

[12] See http://www.openoffice.org/.

Chapter 6. Variables, Making Decisions, and Repeating Actions

Variables are essential for nontrivial programs. They maintain values useful as data and for managing program state. Since the shell is mostly a string processing language, there are lots of things you can do with the string values of shell variables. However, because mathematical operations are essential too, the POSIX shell also provides a mechanism for doing arithmetic with shell variables.

Control-flow features make a programming language: it's almost impossible to get any real work done if all you have are imperative statements. This chapter covers the shell's facilities for testing results, and making decisions based on those results, as well as looping.

Finally, functions let you group task-related statements in one place, making it easier to perform that task from multiple points within your script.

Variables and Arithmetic

Shell variables are like variables in any conventional programming language. They hold values until you need them. We described the basics of shell variable names and values in Section 2.5.2. In addition, shell scripts and functions have positional parameters, which is a fancy term for "command-line arguments."

Simple arithmetic operations are common in shell scripts; e.g., adding one to a variable each time around a loop. The POSIX shell provides a notation for

Return Main Page Previous Page Next Page

®Online Book Reader