Code_ The Hidden Language of Computer Hardware and Software - Charles Petzold [117]
When considering the relationship between punch cards and their associated 8-bit EBCDIC character codes, keep in mind that these codes evolved over many decades under several different types of technologies. For that reason, don't expect to discover too much logic or consistency.
A character is encoded on a punch card by a combination of one or more rectangular holes punched in a single column. The character itself is often printed near the top of the card. The lower 10 rows are identified by number and are known as the 0-row, the 1-row, and so forth through the 9-row. The unnumbered row above the 0-row is called the 11-row, and the top row is called the 12-row. There is no 10-row.
More IBM punch card terminology: Rows 0 through 9 are known as the digit rows, or digit punches. Rows 11 and 12 are known as the zone rows, or zone punches. And some IBM punch card confusion: Sometimes rows 0 and 9 are considered to be zone rows rather than digit rows.
An 8-bit EBCDIC character code is composed of a high-order nibble (4-bit value) and a low-order nibble. The low-order nibble is the BCD code corresponding to the digit punches of the character. The high-order nibble is a code corresponding (in a fairly arbitrary way) to the zone punches of the character. You'll recall from Chapter 19 that BCD stands for binarycoded decimal—a 4-bit code for digits 0 through 9.
For the digits 0 through 9, there are no zone punches. That lack of punches corresponds to a high-order nibble of 1111. The low-order nibble is the BCD code of the digit punch. Here's a table of EBCDIC codes for the digits 0 through 9:
Hex Code
EBCDIC Character
F0
0
F1
1
F2
2
F3
3
F4
4
F5
5
F6
6
F7
7
F8
8
F9
9
For the uppercase letters, a zone punch of just the 12-row is indicated by the nibble 1100, a zone punch of just the 11-row is indicated by the nibble 1101, and a zone punch of just the 0-row is indicated by the nibble 1110. The EBCDIC codes for the uppercase letters are
Hex Code
EBCDIC Character
Hex Code
EBCDIC Character
Hex Code
EBCDIC Character
C1
A
D1
J
C2
B
D2
K
E2
S
C3
C
D3
L
E3
T
C4
D
D4
M
E4
U
C5
E
D5
N
E5
V
C6
F
D6
O
E6
W
C7
G
D7
P
E7
X
C8
H
D8
Q
E8
Y
C9
I
D9
R
E9
Z
Notice the gaps in the numbering of these codes. In some applications, these gaps can be maddening when you're writing programs using EBCDIC text.
The lowercase letters have the same digit punches as the uppercase letters but different zone punches. For lowercase letters a through i, the 12-row and 0-row are punched, corresponding to the code 1000. For j through r, the 12-row and 11-row are punched. This is the code 1001. For the letters s through z, the 11-row and 0-row are punched—the code 1010. The EBCDIC codes for the lowercase letters are
Hex Code
EBCDIC Character
Hex Code
EBCDIC Character
Hex Code
EBCDIC Character
81
a
91
j
82
b
92
k
A2
s
83
c
93
l
A3
t
84
d
94
m
A4
u
85
e
95
n
A5
v
86
f
96
o
A6
w
87
g
97
p
A7
x
88
h
98
q
A8
y
89
i
99
r
A9
z
Of course, there are other EBCDIC codes for punctuation and control characters, but it's hardly necessary to do a full-blown exploration of this system.
It might seem as if each column of an IBM punch card is sufficient to encode 12 bits of information. Each hole is a bit, right? So it should be possible to encode ASCII character codes on a punch card using only 7 of the 12 positions in each column. But in practice, this doesn't work very well. Too many holes get punched, threatening the physical integrity of the card.
Many of the 8-bit codes in EBCDIC aren't defined, suggesting that the use of 7 bits in ASCII makes more sense. At the time ASCII was being developed, memory was