[Next] [Up] [Previous] [Index]
Main : Index : Pencil and Paper Systems : Code Books

Code Books

Before it became possible to build complex electrical and mechanical cipher machines, cipher systems requiring many complex manipulations would have been impractical and error prone.

A code book, with its long list of equivalents for thousands of words and phrases, as opposed to the 26 letters of the alphabet, offers a degree of security without requiring a large number of operations for each word encoded.

Codes have also been used for sending messages by telegraph at lower prices, by representing common phrases used in business with single codegroups. Such codebooks are arranged so that both the codegroups and their meanings are in easy-to-find alphabetical order. A codebook arranged this way is called a one-part code.

Most (but not all) secret codebooks are called two-part codes; the codegroups are assigned in a random order to the words encoded, and there is both an encoding section, with the words in alphabetical order, and a decoding section, with the codegroups in alphabetical (or numerical) order.

Thus, the decoding section of a codebook might look like this imaginary example that I also used as the background to this page:

EZNLJ Shanghai
EZNKL OUGH
EZNLM 270 degrees
EZNMN Ship may not be
EZNNO Docking facilities
EZNOP Diesel fuel
EZNRS France
EZNST Repair-s-ed-ing
EZNTU Ship has
EZNUV Cancun
EZNVW 43 degrees
EZNYZ 500
EZNZA 23 knots
EZOAA Maintenance urgently required
EZOBB Perth
EZOCC 15 metres
EZODD Captain will be
EZOEE 23 3/4

The rationale behind the sequence of 5-letter groups shown is explained in the section on Error-Correcting Codes.

Some codebooks assign both numeric codegroups and alphabetic codegroups to words; the alphabetic codegroups are easier to actually transmit by Morse code, but the numerical codegroups are easier to manipulate.

For secret codes, the manipulation might consist of superencipherment, the encipherment of a message that is already in code. But many nonsecret cable codes also provided numeric codegroups. This allowed people in a specialized business to, by agreement, use the codegroups in one section of a large codebook, which contained words and phrases they did not use, to instead represent the words and phrases another, shorter codebook or in a section of another codebook. All that was needed was to add or subtract an offset from the codegroups in the other codebook to fit them into the unused space.

To increase security, secret codebooks often included nulls, that is, codegroups which were to be ignored upon decipherment. Also, many codebooks included more than one substitute for the most common words or phrases.

David Kahn's book The Codebreakers is illustrated with actual pages from once-secret codebooks, such as the British and Allied Merchant Ship code, and the Hudson code of the American Expeditionary Force.

Another codebook illustrated was Cypher SA, the codebook of the British Navy in the last months of World War I. Another illustration of a different part of this codebook also appeared in David Kahn's article in the July 1966 issue of Scientific American.

This codebook was perhaps unique, in that it used a stripped-down form of the autokey principle.

It used a considerable number of nulls, and every message had to start with a null, because many of the most common words and phrases in the code could not begin a message.

Each five-digit codegroup in the code was followed by one of the three letters A, B, or C. Many of the more common words and phrases had three different substitutes, preceded by A, B, and C in order, and the one to be used was determined by which letter had followed the previous codegroup.

Naturally, only the numbers were transmitted in the enciphered message.

Some of the most common words and phrases were also homophones in the ordinary sense; instead of merely having one set of three substitutes, they might have had three sets of substitutes, so that for each letter ending the previous word, there would still be three arbitrary choices for the codegroup to use.

Since what we have is essentially three different codes, A, B, and C, although these codes are the same in part, determined for each group by the codegroup enciphered before, Cypher SA is properly classed as a form of autokey.

With many codes, a form of polyalphabetic substitution is used. In addition to the two-part codebook, with numerical equivalents for the words or phrases to be sent, a second book, filled with random numbers, is required. This second book's contents are called the additive. A random starting point in the book is chosen for each message, and that starting point is sent at the beginning of the message. Then, the numbers in the book are added to the codegroups from the codebook before transmission. Always, carries past the start of a codegroup are discarded; almost always, all carries are ignored, the individual digits being added in isolation. This is the decimal equivalent of doing an XOR instead of addition.

Sometimes other methods of encrypting an already coded message, called superencryption, are used. For example, a short table giving subsitutes for pairs of digits can be used, either on the codegroups, or just on the group which gives the starting point of the additive.

When a long running key is used for Vigenere encryption, but that key is re-used, Kerchoffs superimposition can be used to align different messages encrypted with the same key. The messages are slid against each other, and positions that provide a high number of coincidences, particularly those involving groups of consecutive characters, are chosen.

For breaking codes used with additives, Kerchoffs superimposition is usually used in a more sensitive form, as improved by W. F. Friedman. The kappa test compares the proportion of coincidences that would be found, in the Vigenere case, between two sequences of random letters, which would be exactly 1/26, and between two normal plain-language texts. That is higher, because the letters are not all equal in frequency. That figure equals the sum of the squares of the probabilities of all the plaintext symbols; the chance of an A in the first message times the chance of an A in the second message, plus the chance of a B in the first message times the chance of a B in the second, and so on. The same applies to two strings of random groups of five digits, which would have one group in 100,000 matching by chance, and two coded messages without an additive applied. If two messages are aligned so that their additives coincide, as far as coincidences between them at that position are concerned, it is as if no additive was applied.

The following illustates why the random kappa is always smaller than the plaintext kappa:

 ---------------------  ----------------------
 |----|              |  |---------|          |
 |----|              |  |---------|          |
 |----------         |  |---------|          |
 |    |----|         |  |---------|          |
 |    |----|         |  |---------------     |
 |    -----------    |  |         |----|     |
 |         |----|    |  |         |----|     |
 |         |----|    |  |         ---------  |
 |         ----------|  |              |--|  |
 |              |----|  |              ------|
 |              |----|  |                 |--|
 ---------------------  ----------------------

The square of a number gains size from both of its factors, so taking size from a smaller number squared and giving it to a larger one causes that size to be placed more advantageously; therefore making all the numbers equal minimizes the sum of the squares.


[Next] [Up] [Previous] [Index]

Next
Skip to Next Chapter
Table of Contents
Home page