Online Book Reader

Home Category

Code_ The Hidden Language of Computer Hardware and Software - Charles Petzold [118]

By Root 1673 0
very expensive. Some people felt that ASCII should be a 6-bit code using a shift character to differentiate between lowercase and uppercase to conserve memory. Once that idea was rejected, others believed that ASCII should be an 8-bit code because even at that time it was considered more likely that computers would have 8-bit architectures than 7-bit architectures. Of course, 8-bit bytes are now the standard. Although ASCII is technically a 7-bit code, it's almost universally stored as 8-bit values.

The equivalence of bytes and characters is certainly convenient because we can get a rough sense of how much computer memory a particular text document requires simply by counting the characters. To some, the kilos and megas of computer storage are more comprehensible when expressed in terms of text.

For example, a traditional double-spaced typewritten 8½-by-11-inch page with 1-inch margins has about 27 lines of text. Each line is about 6½ inches wide with 10 characters per inch, for a total of about 1750 bytes. A singlespace typewritten page has about double that, or 3.5 kilobytes.

A page in The New Yorker magazine has 3 columns of text with 60 lines per column and about 40 characters per line. That's 7200 characters (or bytes) per page.

The New York Times has six columns of text per page. If the entire page is covered with text without any titles or pictures (which is highly unusual), each column has 155 lines of about 35 characters each. The entire page has 32,550 characters, or 32 kilobytes.

A hardcover book has about 500 words per page. An average word is about 5 letters—actually 6 characters, counting the space between words. So a book has about 3000 characters per page. Let's say the average book has 333 pages, which may be a made-up figure but nicely implies that the average book is about 1 million bytes, or 1 megabyte.

Of course, books vary all over the place:

F. Scott Fitzgerald's The Great Gatsby is about 300 kilobytes.

J. D. Salinger's Catcher in the Rye is about 400 kilobytes.

Mark Twain's The Adventures of Huckleberry Finn is about 540 kilobytes.

John Steinbeck's The Grapes of Wrath is about a megabyte.

Herman Melville's Moby Dick is about 1.3 megabytes.

Henry Fielding's The History of Tom Jones is about 2.25 megabytes.

Margaret Mitchell's Gone with the Wind is about 2.5 megabytes.

Stephen King's complete and uncut The Stand is about 2.7 megabytes.

Leo Tolstoy's War and Peace is about 3.9 megabytes.

Marcel Proust's Remembrance of Things Past is about 7.7 megabytes.

The United States Library of Congress has about 20 million books for a total of 20 trillion characters, or 20 terabytes, of text data. (It has a bunch of photographs and sound recordings as well.)

Although ASCII is certainly the most important standard in the computer industry, it isn't perfect. The big problem with the American Standard Code for Information Interchange is that it's just too darn American! Indeed, ASCII is hardly suitable even for other nations whose principal language is English. Although ASCII includes a dollar sign, where is the British pound sign? And what about the accented letters used in many Western European languages? To say nothing of the non-Latin alphabets used in Europe, including Greek, Arabic, Hebrew, and Cyrillic. Or the Brahmi scripts of India and Southeast Asia, including Devanagari, Bengali, Thai, and Tibetan. And how can a 7-bit code possibly handle the tens of thousands of ideographs of Chinese, Japanese, and Korean and the ten thousand–odd Hangul syllables of Korean?

Even when ASCII was being developed, the needs of some other nations were kept in mind, although without much consideration for non-Latin alphabets. According to the published ASCII standard, ten ASCII codes (40h, 5Bh, 5Ch, 5Dh, 5Eh, 60h, 7Bh, 7Ch, 7Dh, and 7Eh) are available to be redefined for national uses. In addition, the number sign (#) can be replaced by the British pound sign (£), and the dollar sign ($) can be replaced by a generalized currency sign (¤) if necessary. Obviously, replacing symbols makes sense only when everyone

Return Main Page Previous Page Next Page

®Online Book Reader