Classic Shell Scripting - Arnold Robbins [59]
* * *
A Digression on Databases
Most commercial databases today are constructed as relational databases: data is accessible as key:value pairs, and join operations are used to construct multicolumn tables to provide views of selected subsets of the data. Relational databases were first proposed in 1970 by E. F. Codd,[2] who actively promoted them, despite initial database industry opposition that they could not be implemented efficiently. Fortunately, clever programmers soon figured out how to solve the efficiency problem. Codd's work is so important that, in 1981, he was given the prestigious ACM Turing Award, the closest thing in computer science to the Nobel Prize.
T oday, there are several ISO standards for the Structured Query Language (SQL), making vendor-independent database access possible, and one of the most important SQL operations is join. Hundreds of books have been published about SQL; to learn more, pick a general one like SQL in a Nutshell.[3] Our simple office-directory task thus has an important lesson in it about the central concept of modern relational databases, and Unix software tools can be extremely valuable in preparing input for databases, and in processing their output.
* * *
* * *
[1] On some systems, file formats are in Section 7; thus, you might need to use man 7 passwd instead.
[2] E. F. Codd, A Relational Model of Data for Large Shared Data Banks, Communications of the ACM, 13(6) 377-387, June (1970), and Relational Database: A Practical Foundation for Productivity, Communications of the ACM, 25(2) 109-117, February (1982) (Turing Award lecture).
[3] By Kevin Kline and Daniel Kline, O'Reilly & Associates, 2000, ISBN 1-56592-744-3. See also http://www.math.utah.edu/pub/tex/bib/sqlbooks.html for an extensive list of SQL books.
Structured Data for the Web
The immense popularity of the World Wide Web makes it desirable to be able to present data like the office directory developed in the last section in a form that is a bit fancier than our simple text file.
Web files are mostly written in a markup language called HyperText Markup Language (HTML). This is a family of languages that are specific instances of the Standard Generalized Markup Language (SGML), which has been defined in several ISO standards since 1986. The manuscript for this book was written in DocBook/XML, which is also a specific instance of SGML. You can find a full description of HTML in HTML & XHTML: The Definitive Guide (O'Reilly).[4]
For the purposes of this section, we need only a tiny subset of HTML, which we present here in a small tutorial. If you are already familiar with HTML, just skim the next page or two.
Here is a minimal standards-conformant HTML file produced by a useful tool written by one of us:[5]
$ echo Hello, world. | html-pretty
Hello, world.
The points to note in this HTML output are:
HTML comments are enclosed in .
Special processor commands are enclosed in : here, the DOCTYPE command tells an SGML parser what the document type is and where to find its grammar file.
Markup is supplied by angle-bracketed words, called tags. In HTML, lettercase is not significant in tag names: html-pretty normally uppercases tag names for better visibility.
Markup environments consist of a begin tag, < NAME >, and an end tag, NAME >, and for many tags, environments can be nested within each other according to rules defined in the HTML grammars.