Classic Shell Scripting - Arnold Robbins [38]
s/foo/bar/g
s/chicken/cow/g
s/draft animal/horse/g
...
$ sed -f fixup.sed myfile.xml > myfile2.xml
You can build up a script by combining the -e and -f options; the script is the concatenation of all editing commands provided by all the options, in the order given. Additionally, POSIX allows you to separate commands on the same line with a semicolon:
sed 's/foo/bar/g ; s/chicken/cow/g' myfile.xml > myfile2.xml
However, many commercial versions of sed don't (yet) allow this, so it's best to avoid it for absolute portability.
Like its ancestor ed and its cousins ex and vi, sed remembers the last regular expression used at any point in a script. That same regular expression may be reused by specifying an empty regular expression:
s/foo/bar/3 Change third foo
s//quux/ Now change first one
Consider a straightforward script named html2xhtml.sed for making a start at converting HMTL to XHTML. This script converts tags to lowercase, and changes the
tag into the self-closing form,
:
s/
//g Slash delimiter
s/
//g
s/
//g
s/
//g
s/
//g
s/
//g
s:::g Colon delimiter, slash in data
s:::g
s:::g
s:::g
s:::g
s:::g
s/<[Hh][Tt][Mm][LL]>//g
s:[Hh][Tt][Mm][LL]>::g
s:<[Bb][Rr]>:
:g
...
Such a script can automate a large part of the task of converting from HTML to XHTML, the standardized XML-based version of HTML.
sed Operation
sed's operation is straightforward. Each file named on the command line is opened and read, in turn. If there are no files, standard input is used, and the filename "-" (a single dash) acts as a pseudonym for standard input.
sed reads through each file one line at a time. The line is placed in an area of memory termed the pattern space. This is like a variable in a programming language: an area of memory that can be changed as desired under the direction of the editing commands. All editing operations are applied to the contents of the pattern space. When all operations have been completed, sed prints the final contents of the pattern space to standard output, and then goes back to the beginning, reading another line of input.
This operation is shown in Figure 3-2. The script uses two commands to change The Unix System into The UNIX Operating System.
Figure 3-2. Commands in sed scripts changing the pattern space
To print or not to print
The -n option modifies sed's default behavior. When supplied, sed does not print the final contents of the pattern space when it's done. Instead, p commands in the script explicitly print the line. For example, one might simulate grep in this way:
sed -n '//p' *.html Only print lines
Although this example seems trivial, this feature is useful in more complicated scripts. If you use a script file, you can enable this feature by using a special first line:
#n Turn off automatic printing
//p Only print lines
As in the shell and many other Unix scripting languages, the # is a comment. sed comments have to appear on their own lines, since they're syntactically commands; they're just commands that don't do anything. While POSIX indicates that comments may appear anywhere in a script, many older versions of sed allow them only on the first line. GNU sed does not have this limitation.
Matching Specific Lines
As mentioned, by default, sed applies every editing command to every input line. It is possible to restrict the lines to which a command applies by prefixing the command with an address. Thus, the full form of a sed command is:
address
command
There are different kinds of addresses:
Regular expressions
Prefixing a command with a pattern limits the command to lines matching the pattern. This can be used with the s command:
/oldfunc/ s/$/# XXX: migrate to newfunc/ Annotate some source code
An empty pattern in the s command means "use the previous regular expression":
/Tolstoy/ s//& and Camus/g Talk about both authors
The last line
The symbol