Running Linux, 5th Edition - Matthias Kalle Dalheimer [235]
In this section, we're going to discuss the most common file formats and utilities you're likely to run into. For instance, a near-universal convention in the Unix world is to transport files or software as a tar archive, compressed using compress, gzip, or bzip2. In order to create or unpack these files yourself, you'll need to know the tools of the trade. The tools are most often used when installing new software or creating backups—the subject of the following two sections in this chapter. Packages coming from other worlds, such as the Windows or Java world, are often archived and compressed using the zip utility; you can unpack these with the unzip command, which should be available in most Linux installations.[*]
Using gzip and bzip2
gzip is a fast and efficient compression program distributed by the GNU project. The basic function of gzip is to take a file, compress it, save the compressed version as filename.gz, and remove the original, uncompressed file. The original file is removed only if gzip is successful; it is very difficult to accidentally delete a file in this manner. Of course, being GNU software, gzip has more options than you want to think about, and many aspects of its behavior can be modified using command-line options.
First, let's say that we have a large file named garbage.txt:
rutabaga$ ls -l garbage.txt
-rw-r--r-- 1 mdw hack 312996 Nov 17 21:44 garbage.txt
To compress this file using gzip, we simply use the command:
gzip garbage.txt
This replaces garbage.txt with the compressed file garbage.txt.gz. What we end up with is the following:
rutabaga$ gzip garbage.txt
rutabaga$ ls -l garbage.txt.gz
-rw-r--r-- 1 mdw hack 103441 Nov 17 21:44 garbage.txt.gz
Note that garbage.txt is removed when gzip completes.
You can give gzip a list of filenames; it compresses each file in the list, storing each with a .gz extension. (Unlike the zip program for Unix and MS-DOS systems, gzip will not, by default, compress several files into a single .gz archive. That's what tar is for; see the next section.)
How efficiently a file is compressed depends on its format and contents. For example, many graphics file formats (such as PNG and JPEG) are already well compressed, and gzip will have little or no effect upon such files. Files that compress well usually include plain-text files and binary files, such as executables and libraries. You can get information on a gzipped file using gzip -l. For example:
rutabaga$ gzip -l garbage.txt.gz
compressed uncompr. ratio uncompressed_name
103115 312996 67.0% garbage.txt
To get our original file back from the compressed version, we use gunzip, as in the following:
gunzip garbage.txt.gz
After doing this, we get:
rutabaga$ gunzip garbage.txt.gz
rutabaga$ ls -l garbage.txt
-rw-r--r-- 1 mdw hack 312996 Nov 17 21:44 garbage.txt
which is identical to the original file. Note that when you gunzip a file, the compressed version is removed once the uncompression is complete. Instead of using gunzip, you can also use gzip -d (e.g., if gunzip happens not to be installed).
gzip stores the name of the original, uncompressed file in the compressed version. This way, if the compressed filename (including the .gz extension) is too long for the filesystem type (say, you're compressing a file on an MS-DOS filesystem with 8.3 filenames), the original filename can be restored using gunzip even if the compressed file had a truncated name. To uncompress a file to its original filename, use the -N option with gunzip. To see the value of this option, consider the following sequence of commands:
rutabaga$ gzip garbage.txt