Code_ The Hidden Language of Computer Hardware and Software - Charles Petzold [114]
Where are the numbers and punctuation marks in the Baudot system? That's the purpose of code 1Bh, identified in the table as Figure Shift. After the Figure Shift code, all subsequent codes are interpreted as numbers or punctuation marks until the Letter Shift code (1Fh) causes them to revert to the letters. Here are the codes for the numbers and punctuation:
Hex Code
Baudot Figure
Hex Code
Baudot Figure
00
10
3
01
5
11
+
02
Carriage Return
12
Who Are You?
03
9
13
?
04
Space
14
'
05
#
15
6
06
,
16
$
07
.
17
/
08
Line Feed
18
-
09
)
19
2
0A
4
1A
Bell
0B
&
1B
Figure Shift
0C
8
1C
7
0D
0
1D
1
0E
:
1E
(
0F
=
1F
Letter Shift
Actually, the code as formalized by the ITU doesn't define codes 05h, 0Bh, and 16h, and instead reserves them "for national use." The table shows how these codes were used in the United States. The same codes were often used for accented letters of some European languages. The Bell code is supposed to ring an audible bell on the teletypewriter. The "Who Are You?" code activates a mechanism whereby a teletypewriter can identify itself.
Like Morse code, this 5-bit code doesn't differentiate between uppercase and lowercase. The sentence
I SPENT $25 TODAY.
is represented by the following stream of hexadecimal data:
0C 04 14 0D 10 06 01 04 1B 16 19 01 1F 04 01 03 12 18 15 1B 07 02 08
Notice the three shift codes: 1Bh right before the number, 1Fh after the number, and 1Bh again before the final period. The line concludes with carriage-return and linefeed codes.
Unfortunately, if you sent this stream of data to a teletypewriter printer twice in a row, it would come out like this:
I SPENT $25 TODAY.
8 '03,5 $25 TODAY.
What happened? The last shift code the printer received before the second line was a Figure Shift code, so the codes at the beginning of the second line were interpreted as numbers.
Problems like this are typical nasty results of using shift codes. Although Baudot is certainly an economical code, it's probably preferable to use unique codes for numbers and punctuation, as well as separate codes for lowercase and uppercase letters.
So if we want to figure out how many bits we need for a better character encoding system than Baudot, just add them up: We need 52 codes just for the uppercase and lowercase letters and 10 codes for the digits 0 through 9. We're up to 62 already. Throw in a few punctuation marks, and we top 64 codes, which means we need more than 6 bits. But we seem to have lots of leeway before we exceed 128 characters, which would require 8 bits.
So the answer is 7. We need 7 bits to represent the characters of English text if we want uppercase and lowercase with no shifting.
And what are these codes? Well, the actual codes can be anything we want. If we were going to build our own computer, and we were going to build every piece of hardware required by this computer, and we were going to program this computer ourselves and never use the computer to connect to any other computer, we could make up our own codes. All we need do is assign every character we'll be using a unique code.
But since it's rarely the case that computers are built and used in isolation, it makes more sense for everyone to agree to use the same codes. That way, the computers that we build can be more compatible with one another and maybe even actually exchange textual information.
We also probably shouldn't assign codes in a haphazard manner. For example, when we work with text on the computer, certain advantages accrue if the letters of the alphabet are assigned to sequential codes. This ordering scheme makes alphabetizing and sorting easier, for example.
Fortunately, such a standard has already been developed. It's called the American Standard Code for Information Interchange, abbreviated ASCII, and referred to with the unlikely pronunciation ASS-key. It was formalized in 1967 and remains the single most important standard in the entire computer industry. With one big exception (which I'll