[Next] [Up] [Previous] [Index]


A number of methods are possible for converting binary bits to printable characters.

One of the simplest is to take five or six bits of binary data to select one of 32 or 64 characters. Other, more complicated schemes are possible, though.

If 85 characters can be used, then five characters are enough to represent four bytes of random data. If 86 characters can be used, a simplified scheme can achieve the same result, since 86 times 3 is 258, which is larger than 256. Assign not more than three of the 256 possible values for a byte to each of the 86 allowed characters. Then, after representing four bytes by one of those characters, one character from a set of 81 (3 times 3 times 3 times 3) can resolve which of the 256 byte values, or which of up to 3 values for the character given, is valid for each of the four bytes.

The problem of converting messages to text form for transmission over the Internet is, of course, closely tied to the ASCII representation of characters. Here is a chart showing the printable characters in ASCII in graphical form:

and here is an ASCII chart in text form, first just the control characters:

      0                       0
      0                       0
      0                       1
0000  NUL Null                DLE Data Link Escape
0001  SOH Start of Header     DC1 Device Control 1
0010  STX Start of Text       DC2 Device Control 2
0011  ETX End of Text         DC3 Device Control 3
0100  EOT End of Transmission DC4 Device Control 4
0101  ENQ Enquiry             NAK Negative Acknowledge
0110  ACK Acknowledge         SYN Synchronization
0111  BEL Bell                ETB End of Transmission Block
1000  BS  Backspace           CAN Cancel
1001  HT  Horizontal Tab      EM  End of Medium
1010  LF  Line Feed           SUB Substitute
1011  VT  Vertical Tab        ESC Escape
1100  FF  Form Feed           FS  Field Separator
1101  CR  Carriage Return     GS  Group Separator
1110  SO  Shift Out           RS  Record Separator
1111  SI  Shift In            US  Unit Separator

and now the entire 7-bit ASCII code, with only the abbreviations of the first 32 control characters:

      0   0   0 0 1 1 1 1
      0   0   1 1 0 0 1 1
      0   1   0 1 0 1 0 1
0000  NUL DLE   0 @ P ` p
0001  SOH DC1 ! 1 A Q a q
0010  STX DC2 " 2 B R b r
0011  ETX DC3 # 3 C S c s
0100  EOT DC4 $ 4 D T d t
0101  ENQ NAK % 5 E U e u
0110  ACK SYN & 6 F V f v
0111  BEL ETB ' 7 G W g w
1000  BS  CAN ( 8 H X h x
1001  HT  EM  ) 9 I Y i y
1010  LF  SUB * : J Z j z
1011  VT  ESC + ; K [ k {
1100  FF  FS  , < L \ l |
1101  CR  GS  - = M ] m }
1110  SO  RS  . > N ^ n ~
1111  SI  US  / ? O _ o DEL Delete

The problem of transmitting data in ASCII text characters over the Internet is complicated by the fact that some of the characters in ASCII do not have counterparts in other data transmission codes used by some computers, such as the original version of EBCDIC. Also, some character positions in 7-bit ASCII are used to represent different characters in other countries. Thus, the characters in the same columns as the letters are often used to represent accented letters; the symbol # is replaced in the United Kingdom by the British pound sign.

In some transmissions over the Internet, a line beginning with a minus sign or hyphen (-) runs the risk of being interpreted as a header line, indicating the MIME type of a section of a document.

Of course, if one wishes to send a text by Morse Code, or over a 5-unit teletypewriter link, the best way to do it would be to convert the binary data to letters from the 26-letter alphabet, using no other characters.

I have worked out an elaborate and efficient scheme for doing this.

This scheme could also serve other purposes. Since a great many historical encryption algorithms are aimed at the 26-letter alphabet, one could apply them to a text already encrypted by modern methods in binary after such a conversion.

And there is an easy way to send a text composed only of uppercase letters over the Internet efficiently, if a 78-character character set is possible. Take four letters: the first can be encoded as three symbols from 1 to 3. Combine one such symbol with each remaining letter to determine which character to use from a character set with 3 times 26 characters, which is 78 characters.

A section is also included here about the question of how to perform the special processing required to end a message efficiently and securely when the length of the original plaintext message isn't an integer number of blocks in size, whether those are blocks used by a block cipher or blocks for the conversion process used to produce armor for transmission. Also, since base conversion is discussed here, this spot seemed as good as any to place a discussion of base conversion as it applies to random or pseudorandom keystreams.

A Table of Powers, useful in finding ideas for ways to perform fractionation.

[Next] [Up] [Previous] [Index]

Skip to Next Section
Table of Contents
Main Page