Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [38]

By Root 857 0

s/foo/bar/g

s/chicken/cow/g

s/draft animal/horse/g

...

$ sed -f fixup.sed myfile.xml > myfile2.xml

You can build up a script by combining the -e and -f options; the script is the concatenation of all editing commands provided by all the options, in the order given. Additionally, POSIX allows you to separate commands on the same line with a semicolon:

sed 's/foo/bar/g ; s/chicken/cow/g' myfile.xml > myfile2.xml

However, many commercial versions of sed don't (yet) allow this, so it's best to avoid it for absolute portability.

Like its ancestor ed and its cousins ex and vi, sed remembers the last regular expression used at any point in a script. That same regular expression may be reused by specifying an empty regular expression:

s/foo/bar/3 Change third foo

s//quux/ Now change first one

Consider a straightforward script named html2xhtml.sed for making a start at converting HMTL to XHTML. This script converts tags to lowercase, and changes the
tag into the self-closing form,
:

s/

/

/g Slash delimiter

s/

/

/g

s/

/

/g

s/

/

/g

s/

/
/g

s/

/
/g

s:

::g Colon delimiter, slash in data

s:::g

s:::g

s:::g

s:::g

s:::g

s/<[Hh][Tt][Mm][LL]>//g

s:::g

s:<[Bb][Rr]>:
:g

...

Such a script can automate a large part of the task of converting from HTML to XHTML, the standardized XML-based version of HTML.

sed Operation

sed's operation is straightforward. Each file named on the command line is opened and read, in turn. If there are no files, standard input is used, and the filename "-" (a single dash) acts as a pseudonym for standard input.

sed reads through each file one line at a time. The line is placed in an area of memory termed the pattern space. This is like a variable in a programming language: an area of memory that can be changed as desired under the direction of the editing commands. All editing operations are applied to the contents of the pattern space. When all operations have been completed, sed prints the final contents of the pattern space to standard output, and then goes back to the beginning, reading another line of input.

This operation is shown in Figure 3-2. The script uses two commands to change The Unix System into The UNIX Operating System.

Figure 3-2. Commands in sed scripts changing the pattern space

To print or not to print

The -n option modifies sed's default behavior. When supplied, sed does not print the final contents of the pattern space when it's done. Instead, p commands in the script explicitly print the line. For example, one might simulate grep in this way:

sed -n '//p' *.html Only print lines

Although this example seems trivial, this feature is useful in more complicated scripts. If you use a script file, you can enable this feature by using a special first line:

#n Turn off automatic printing

//p Only print lines

As in the shell and many other Unix scripting languages, the # is a comment. sed comments have to appear on their own lines, since they're syntactically commands; they're just commands that don't do anything. While POSIX indicates that comments may appear anywhere in a script, many older versions of sed allow them only on the first line. GNU sed does not have this limitation.

Matching Specific Lines

As mentioned, by default, sed applies every editing command to every input line. It is possible to restrict the lines to which a command applies by prefixing the command with an address. Thus, the full form of a sed command is:

address

command

There are different kinds of addresses:

Regular expressions

Prefixing a command with a pattern limits the command to lines matching the pattern. This can be used with the s command:

/oldfunc/ s/$/# XXX: migrate to newfunc/ Annotate some source code

An empty pattern in the s command means "use the previous regular expression":

/Tolstoy/ s//& and Camus/g Talk about both authors

The last line

The symbol

Return Main Page Previous Page Next Page

®Online Book Reader