Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [46]

By Root 800 0

Define the sort key field. See Section 4.1.2, for details.

-m

Merge already-sorted input files into a sorted output stream.

-n

Compare fields as integer numbers.

-o outfile

Write output to the specified file instead of to standard output. If the file is one of the input files, sort copies it to a temporary file before sorting and writing the output.

-r

Reverse the sort order to descending, rather than the default ascending.

-t char

Use the single character char as the default field separator, instead of the default of whitespace.

-u

Unique records only: discard all but the first record in a group with equal keys. Only the key fields matter: other parts of the discarded records may differ.

Behavior

sort reads the specified files, or standard input if no files are given, and writes the sorted data on standard output.

* * *

Sorting by Lines

In the simplest case, when no command-line options are supplied, complete records are sorted according to the order defined by the current locale. In the traditional C locale, that means ASCII order, but you can set an alternate locale as we described in Section 2.8.

A tiny bilingual dictionary in the ISO 8859-1 encoding translates four French words differing only in accents:

$ cat french-english

Show the tiny dictionary

côte coast

cote dimension

coté dimensioned

côté side

To understand the sorting, use the octal dump tool, od, to display the French words in ASCII and octal:

$ cut -f1 french-english | od -a -b

Display French words in octal bytes

0000000 c t t e nl c o t e nl c o t i nl c

143 364 164 145 012 143 157 164 145 012 143 157 164 351 012 143

0000020 t t i nl

364 164 351 012

0000024

Evidently, with the ASCII option -a, od strips the high-order bit of characters, so the accented letters have been mangled, but we can see their octal values: é is 3518 and ô is 3648.

On GNU/Linux systems, you can confirm the character values like this:

$ man iso_8859_1

Check the ISO 8859-1 manual page

...

Oct Dec Hex Char Description

--------------------------------------------------------------------

...

351 233 E9 é LATIN SMALL LETTER E WITH ACUTE

...

364 244 F4 ô LATIN SMALL LETTER O WITH CIRCUMFLEX

...

First, sort the file in strict byte order:

$ LC_ALL=C sort french-english

Sort in traditional ASCII order

cote dimension

coté dimensioned

côte coast

côté side

Notice that e (1458) sorted before é (3518), and o (1578) sorted before ô (3648), as expected from their numerical values.

Now sort the text in Canadian-French order:

$ LC_ALL=fr_CA.iso88591 sort french-english

Sort in Canadian-French locale

côte coast

cote dimension

coté dimensioned

côté side

The output order clearly differs from the traditional ordering by raw byte values.

Sorting conventions are strongly dependent on language, country, and culture, and the rules are sometimes astonishingly complex. Even English, which mostly pretends that accents are irrelevant, can have complex sorting rules: examine your local telephone directory to see how lettercase, digits, spaces, punctuation, and name variants like McKay and Mackay are handled.

Sorting by Fields

For more control over sorting, the -k option allows you to specify the field to sort on, and the -t option lets you choose the field delimiter.

If -t is not specified, then fields are separated by whitespace and leading and trailing whitespace in the record is ignored. With the -t option, the specified character delimits fields, and whitespace is significant. Thus, a three-character record consisting of space-X-space has one field without -t, but three with -t' ' (the first and third fields are empty).

The -k option is followed by a field number, or number pair, optionally separated by whitespace after -k. Each number may be suffixed by a dotted character position, and/or one of the modifier letters shown in Table 4-1.

Table 4-1. Sort key field types

Letter

Description

b

Ignore leading whitespace.

d

Dictionary order.

f

Fold letters implicitly to a common

Return Main Page Previous Page Next Page

®Online Book Reader