Online Book Reader

Home Category

Classic Shell Scripting - Arnold Robbins [162]

By Root 966 0
is a compact representation of the input-file line numbers where the difference occurred, and the operation needed to make the edit: here, c means change. In larger examples, you will usually also find a for add and d for delete.

diff's output is carefully designed so that it can be used by other programs. For example, revision control systems use diff to manage the differences between successive versions of files under their management.

There is an occasionally useful companion to diff that does a slightly different job. diff3 compares three files, such as a base version and modified files produced by two different people, and produces an ed-command script that can be used to merge both sets of modifications back into the base version. We do not illustrate it here, but you can find examples in the diff3(1) manual pages.

The patch Utility

The patch utility uses the output of diff and either of the original files to reconstruct the other one. Because the differences are generally much smaller than the original files, software developers often exchange difference listings via email, and use patch to apply them. Here is how patch can convert the contents of test.1 to match those of test.2:

$ diff -c test.[12] > test.dif

Save a context difference in test.dif

$ patch < test.dif

Apply the differences

patching file test.1

$ cat test.1

Show the patched test.1 file

Test 2

patch applies as many of the differences as it can; it reports any failures for you to handle manually.

Although patch can use the ordinary output of diff, it is more common to use diff's -c option to get a context difference. That more verbose report tells patch the filenames, and allows it to verify the change location and to recover from mismatches. Context differences are not essential if neither of the two files has been changed since the differences were recorded, but in software development, quite often one or the other will have evolved.

File Checksum Matching

If you have lots of files that you suspect have identical contents, using cmp or diff would require comparing all pairs of them, leading to an execution time that grows quadratically in the number of files, which is soon intolerable.

You can get nearly linear performance by using file checksums. There are several utilities for computing checksums of files and strings, including sum, cksum, and checksum,[9] the message-digest tools[10] md5 and md5sum, and the secure-hash algorithm[11] tools sha, sha1sum, sha256, and sha384. Regrettably, implementations of sum differ across platforms, making its output useless for comparisons of checksums of files on different flavors of Unix. The native version of cksum on OSF/1 systems produces different checksums than versions on other systems.

Except for the old sum command, only a few of these programs are likely to be found on an out-of-the-box system, but all are easy to build and install. Their output formats differ, but here is a typical example:

$ md5sum /bin/l?

696a4fa5a98b81b066422a39204ffea4 /bin/ln

cd6761364e3350d010c834ce11464779 /bin/lp

351f5eab0baa6eddae391f84d0a6c192 /bin/ls

The long hexadecimal signature string is just a many-digit integer that is computed from all of the bytes of the file in such a way as to make it unlikely that any other byte stream could produce the same value. With good algorithms, longer signatures in general mean greater likelihood of uniqueness. The md5sum output has 32 hexadecimal digits, equivalent to 128 bits. Thus, the chance[12] of having two different files with identical signatures is only about one in 264 = 1.84 1019, which is probably negligible. Recent cryptographic research has demonstrated that it is possible to create families of pairs of files with the same MD5 checksum. However, creating a file with similar, but not identical, contents as an existing file, both with the same checksum, is likely to remain a difficult problem.

To find matches in a set of signatures, use them as indices into a table of signature counts, and report just those cases where the counts

Return Main Page Previous Page Next Page

®Online Book Reader