[Next] [Up/Previous] [Index]

Base-26 Armor

Since

 2^47 = 140,737,488,355,328 and
26^10 = 141,167,095,653,376

ten letters can represent 47 bits with fairly good efficiency. In fact, the information content of 10 letters from a 26-character alphabet amounts to 47.00439718 bits, less than 1/227th of a bit extra.

However, one would like to avoid performing 48-bit multiplications to do the conversion.

A method invented for IBM by Tien Chi Chen and Irving Tze Ho for representing three decimal digits in 10 binary bits, which has since become known as Chen-Ho encoding, can illustrate a way in which to do this.

Illustrative Example: Encoding of 3 Digits in 10 Bits

Let ooo (or ppp, qqq) represent a digit from 0 to 7 represented by the following code:

000  0
001  1
010  2
011  3
100  4
101  5
110  6
111  7

and let d (or e, f) represent a digit from 8 to 9 in the following code:

0    8
1    9

then, any three decimal digits can be represented within 10 bits as follows:

0ooopppqqq (3 digits with 3 bits - 1 way)
100dpppqqq (2 digits with 3 bits, 1 digit with 1 bit - 3 ways)
101oooeqqq
110ooopppf
11100deqqq (1 digit with 3 bits, 2 digits with 1 bit - 3 ways)
11101dpppf
11110oooef
1111100def (3 digits with 1 bit - 1 way)

with 24 unused combinations remaining.

This method of encoding was patented by IBM, in U.S. patent 3,842,414. This patent was filed on June 18, 1973, and granted on October 15, 1974. Also, it was described in a paper in the January 1975 issue of Communications of the Association for Computing Machinery. (I thought I first encountered this method in a paper in the IBM Systems Journal, but my memory must be playing tricks on me.)

A further discussion of the encoding of decimal digits using this technique which was formerly found on this page has been moved here, where it seemed to fit better with the topic of the page.

It may or may not already be obvious, but the table illustrating the decimal to binary code is based on ooo, ppp, and qqq representing the first, second, and third digits respectively (if they are from 0 to 7) and on d, e, and f representing the first, second, and third digits respectively if they are instead either 8 or 9. This same notational convention will be used within the description of the coding that takes 47 bits to ten letters.

Here, we have some binary values left over, so we can go from decimal to binary, but we cannot use this code to represent arbitrary binary data as decimal digits without some modification.

With the lengths that I am choosing, it will be the other way around for letters; we will use all possible binary combinations, and still have a few alphabetic combinations left over.

Coding bits as letters

Because the conversion from 47 bits to 10 letters is very efficient, a direct coding, since it would use a large number of possible codes, would be very complicated. Instead, to derive a coding the description of which would be reasonable in length, I have chosen to describe a method that has a two-layer structure.

Shorter codings: the basis

First, we will represent 14 binary bits as three letters.

aaaa (or bbbb, cccc) will represent a letter from the code:

0000 A   0001 B   0010 C   0011 D   0100 E   0101 F   0110 G   0111 H
1000 I   1001 J   1010 K   1011 L   1100 M   1101 N   1110 O   1111 P

iii (or jjj, kkk) will represent a letter from the code:

000 Q   001 R   010 S   011 T   100 U   101 V   110 W   111 X

and m (or n, o) will represent a letter from the code:

0 Y   1 Z

Then, a coding generating three letters from 14 bits looks like the following:

00aaaabbbbcccc  (4,4,4: 1 way)
010iiibbbbcccc  (3,4,4: 3 ways)
011aaaajjjcccc
100aaaabbbbkkk
1010iiijjjcccc  (3,3,4: 3 ways)
1011iiibbbbkkk
1100aaaajjjkkk
11010iiijjjkkk  (3,3,3: 1 way)
11011mbbbbcccc  (1,4,4: 3 ways)
11100aaaancccc
11101aaaabbbbo
111100mjjjcccc  (1,3,4: 6 ways: 4 used here.)
111101mbbbbkkk
111110iiincccc
111111aaaankkk

This shorter coding of 14 bits as three letters is not quite as efficient as the overall coding of 47 bits to ten letters we will construct using it as a component. 3 times 14 is 42, and so there are five bits left, which don't quite fit in one letter. If the first five bits represent a number from 0 to 25, then they can represent a letter, using the codings shown above, in one of the forms 0aaaa, 10iii, or 1100m. Otherwise, codings which use the combinations of three letters which are not used by the above coding of fourteen bits to three letters are used. These codings don't encode fourteen bits, since only a small fraction of the possible combinations of three letters are left.

Other component codes

Some more codings, using the unused letter combinations, are now constructed for use in producing a coding of 47 bits into ten letters that is manageable to describe, since a direct coding is very long and complicated.

There are still some possible arrangements of three letters left over, although the number is now rather small. 10 bits can still be encoded in those which remain, as follows:

00iiibbbbo  (1,3,4: 6 ways: 2 remaining for use here)
01aaaajjjo  
100mjjjkkk  (1,3,3: 3 ways)
101iiinkkk
110iiijjjo
1110mncccc  (1,1,4: 3 ways: 2 used here)
1111mbbbbo

And 7 bits can still be encoded in the few now remaining:

0aaaano  (1,1,4: 3 ways: 1 remaining for use here)
10mnkkk  (1,1,3: 3 ways: 2 used here)
11mjjjo

The remaining combinations make 5:

iiino  (1,1,3: 3 ways: 1 remaining for use here)

and 3 bits:

mno  (1,1,1: 1 way)

The code for 47 bits

To encode 47 bits as ten letters, we now proceed as follows, using the codings that we've constructed so far.

aaaa, iii, or m represent the first of the 10 letters.

The remaining letters are divided into three groups of three, coded using the method above.

Note that the codings already constructed produce a different number of binary bits, either 14, 10, 7, 5, or 3 bits, from every possible sequence of three letters, and no sequence of three letters is used twice. Similarly, for the first of the ten letters, either 4 bits, 3 bits, or 1 bit is produced from that letter, and none of the 26 letters are used twice; 16 of them produce a 4 bit sequence, 8 of them produce a 3 bit sequence, and 2 of them produce a 1 bit sequence. Thus, given a sequence of 10 letters, it is possible, using those codings, not only to decode them uniquely to a string of from 46 to 10 bits, but also to have additional information those bits do not contain: the exact breakdown of which codings were used. Appending prefix bits indicating the coding to use for the different combinations of codings is what allows all combinations of 47 bits to be encoded.

In one important respect, the coding described here is the reverse of a Chen-Ho encoding; in Chen-Ho encoding, all combinations of 3 digits are represented as 1,000 out of the 1,024 possible combinations of 10 bits. Here, not all possible combinations of ten letters are represented in 47 bits; instead, it is all possible combinations of 47 bits which are represented as 140,737,488,355,328 out of the 141,167,095,653,376 possible combinations of 10 letters, but the code is still described as a binary coding of letters even though it is used as a coding of bits in letters and not as a coding of letters in bits.

For the first group of three, the 14, 10, 7, and 5 bit codings are represented by AAAAAAAAAA, FFFFFF, PPP, and X respectively; for the second, BBBBBBBBBB, GGGGGG, QQQ, and Y; for the third, CCCCCCCCCC, HHHHHH, RRR, and Z. The 3 bit coding is not required. Note that the length of the strings is equal to the length of the coding in bits minus 4, so that the length of the symbols used here are constant.

The table below has been revised from its original form. I would like to thank Mr. Ross Presser for bringing to my attention the omission of a portion of the possible space of binary inputs. This has been corrected, and an inconsistency in the ordering of the alphabetic codings used has also been removed.

47 bits become 10 letters as follows:

0aaaaAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way)
10iiiAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way)
1100mAAAAAAAAAABBBBBBBBBBCCCCCCCCCC (14,14,14, 1 way)
11010aaaaFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways)
11011aaaaAAAAAAAAAAGGGGGGCCCCCCCCCC
11100aaaaAAAAAAAAAABBBBBBBBBBHHHHHH
111010iiiFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways)
111011iiiAAAAAAAAAAGGGGGGCCCCCCCCCC
111100iiiAAAAAAAAAABBBBBBBBBBHHHHHH
11110100mFFFFFFBBBBBBBBBBCCCCCCCCCC (10,14,14, 3 ways)
11110101mAAAAAAAAAAGGGGGGCCCCCCCCCC
11110110mAAAAAAAAAABBBBBBBBBBHHHHHH
11110111aaaaPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways)
11111000aaaaAAAAAAAAAAQQQCCCCCCCCCC
11111001aaaaAAAAAAAAAABBBBBBBBBBRRR
111110100iiiPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways)
111110101iiiAAAAAAAAAAQQQCCCCCCCCCC
111110110iiiAAAAAAAAAABBBBBBBBBBRRR
111110111aaaaFFFFFFGGGGGGCCCCCCCCCC (10,10,14, 3 ways)
111111000aaaaFFFFFFBBBBBBBBBBHHHHHH
111111001aaaaAAAAAAAAAAGGGGGGHHHHHH
1111110100iiiFFFFFFGGGGGGCCCCCCCCCC (10,10,14, 3 ways)
1111110101iiiFFFFFFBBBBBBBBBBHHHHHH
1111110110iiiAAAAAAAAAAGGGGGGHHHHHH
1111110111aaaaXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways)
1111111000aaaaAAAAAAAAAAYCCCCCCCCCC
1111111001aaaaAAAAAAAAAABBBBBBBBBBZ
11111110100mPPPBBBBBBBBBBCCCCCCCCCC (7,14,14, 3 ways)
11111110101mAAAAAAAAAAQQQCCCCCCCCCC
11111110110mAAAAAAAAAABBBBBBBBBBRRR
11111110111iiiXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways)
11111111000iiiAAAAAAAAAAYCCCCCCCCCC
11111111001iiiAAAAAAAAAABBBBBBBBBBZ
111111110100aaaaPPPGGGGGGCCCCCCCCCC (7,10,14, 6 ways)
111111110101aaaaPPPBBBBBBBBBBHHHHHH
111111110110aaaaFFFFFFQQQCCCCCCCCCC
111111110111aaaaAAAAAAAAAAQQQHHHHHH
111111111000aaaaFFFFFFBBBBBBBBBBRRR
111111111001aaaaAAAAAAAAAAGGGGGGRRR
1111111110100mXBBBBBBBBBBCCCCCCCCCC (5,14,14, 3 ways)
1111111110101mAAAAAAAAAAYCCCCCCCCCC
1111111110110mAAAAAAAAAABBBBBBBBBBZ
1111111110111iiiPPPGGGGGGCCCCCCCCCC (7,10,14, 6 ways)
1111111111000iiiPPPBBBBBBBBBBHHHHHH
1111111111001iiiFFFFFFQQQCCCCCCCCCC
1111111111010iiiAAAAAAAAAAQQQHHHHHH
1111111111011iiiFFFFFFBBBBBBBBBBRRR
1111111111100iiiAAAAAAAAAAGGGGGGRRR
1111111111101aaaaFFFFFFGGGGGGHHHHHH (10,10,10, 1 way)
11111111111100iiiFFFFFFGGGGGGHHHHHH (10,10,10, 1 way)
111111111111010mPPPGGGGGGCCCCCCCCCC (7,10,14, 6 ways)
111111111111011mPPPBBBBBBBBBBHHHHHH
111111111111100mFFFFFFQQQCCCCCCCCCC
111111111111101mAAAAAAAAAAQQQHHHHHH
111111111111110mFFFFFFBBBBBBBBBBRRR
111111111111111mAAAAAAAAAAGGGGGGRRR

Here is an example, to help illustrate the above.

The 47-bit string

11100011100011100011100011100011100011100011100

will code as follows:

11100 aaaa (14-bits/3 lt) (14-bits/3 lt) (10 bits )
      0111 00011100011100 01110001110001 1100011100
         H 00aaaabbbbcccc 011aaaajjjcccc 110iiijjjo
                H   B   M       I  X   B      R  WY

to produce HHBMI XBRWY as its representation.

Attempting to produce a direct coding, using the aaaa/iii/m symbols for individual letters, produces a very long list of codes. This nested approach is considerably more manageable.

The description of a direct coding would begin like this:

0000000aaaabbbbccccddddeeeeffffgggghhhhiiiijjjj
00000010iiibbbbccccddddeeeeffffgggghhhhiiiijjjj
00000011aaaajjjccccddddeeeeffffgggghhhhiiiijjjj

although, doubtless, different letters would be used, to avoid confusion between iiii and iii, although an alphabet of thirty letters would be required to achieve the effect that might be desired.

A Semi-Direct Approach

As noted, attempting to proceed directly to a coding of 47 bits to 10 letters would be complicated. Thus, the coding above used a coding of 14 bits to 3 letters as the starting point, with additional codings of fewer bits to 3 letters to allow all possible binary values for the original 47 bits to be covered.

There are 676 possible combinations of two letters, which is enough to cover all 512 possible values of nine binary bits. Could this be used as a starting point? Since ten is divisible by two, but not by three, there would be no need to encode one letter singly, as was done above; the regularity produced by this might offset the complexity produced by using a smaller basic unit.

Nine binary bits can be encoded in two letters as follows:

0aaaabbbb
10iiibbbb
11aaaajjj

Seven binary bits can be encoded in some of the remaining combinations like this:

0iiijjj
10mbbbb
11aaaan

Five binary bits can then be encoded as follows:

0mjjj
1iiin

and, finally, two binary bits can be encoded in the remaining possibility:

mn

We can represent the codings of nine binary bits to two letters as AAAAAAAA, BBBBBBBB, CCCCCCCC, DDDDDDDD and EEEEEEEE; the codings of seven binary bits to two letters as IIIIII, JJJJJJ, KKKKKK, LLLLLL and MMMMMM; the codings of five binary bits to two letters as PPPP, QQQQ, RRRR, SSSS and TTTT; and the codings of two binary bits to two letters as V, W, X, Y and Z.

An overall coding of 47 bits to 10 letters of this type might begin like this:

00AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (9,9,9,9,9)
0100IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (7,9,9,9,9, 5 ways)
...
1000AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDMMMMMM
100100PPPPBBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (5,9,9,9,9, 5 ways)
...
101000AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDTTTT
101001IIIIIIJJJJJJCCCCCCCCDDDDDDDDEEEEEEEE (7,7,9,9,9, 10 ways)
...
101100IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDMMMMMM
101101AAAAAAAAJJJJJJKKKKKKDDDDDDDDEEEEEEEE
...
101111AAAAAAAAJJJJJJCCCCCCCCDDDDDDDDMMMMMM
110000AAAAAAAABBBBBBBBKKKKKKLLLLLLEEEEEEEE
110001AAAAAAAABBBBBBBBKKKKKKDDDDDDDDMMMMMM
110010AAAAAAAABBBBBBBBCCCCCCCCLLLLLLMMMMMM
11001100PPPPJJJJJJCCCCCCCCDDDDDDDDEEEEEEEE (5,7,9,9,9, 20 ways)
...
11001111PPPPBBBBBBBBCCCCCCCCDDDDDDDDMMMMMM
11010000IIIIIIQQQQCCCCCCCCDDDDDDDDEEEEEEEE
11010001AAAAAAAAQQQQKKKKKKDDDDDDDDEEEEEEEE
...
11010011AAAAAAAAQQQQCCCCCCCCDDDDDDDDMMMMMM
11010100IIIIIIBBBBBBBBRRRRDDDDDDDDEEEEEEEE
11010101AAAAAAAAJJJJJJRRRRDDDDDDDDEEEEEEEE
11010110AAAAAAAABBBBBBBBRRRRLLLLLLEEEEEEEE
11010111AAAAAAAABBBBBBBBRRRRDDDDDDDDMMMMMM
11011000IIIIIIBBBBBBBBCCCCCCCCSSSSEEEEEEEE
...
11011010AAAAAAAABBBBBBBBKKKKKKSSSSEEEEEEEE
11011011AAAAAAAABBBBBBBBCCCCCCCCSSSSMMMMMM
11011100IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDTTTT
...
11011111AAAAAAAABBBBBBBBCCCCCCCCLLLLLLTTTT
11100000IIIIIIJJJJJJKKKKKKDDDDDDDDEEEEEEEE (7,7,7,9,9, 10 ways)
...
11100010IIIIIIJJJJJJCCCCCCCCDDDDDDDDMMMMMM
11100011IIIIIIBBBBBBBBKKKKKKLLLLLLEEEEEEEE
11100100IIIIIIBBBBBBBBKKKKKKDDDDDDDDMMMMMM
11100101IIIIIIBBBBBBBBCCCCCCCCLLLLLLMMMMMM
11100110AAAAAAAAJJJJJJKKKKKKLLLLLLEEEEEEEE
...
11101001AAAAAAAABBBBBBBBKKKKKKLLLLLLMMMMMM
111010100VBBBBBBBBCCCCCCCCDDDDDDDDEEEEEEEE (2,9,9,9,9, 5 ways)
...
111011000AAAAAAAABBBBBBBBCCCCCCCCDDDDDDDDZ
1110110010PPPPQQQQCCCCCCCCDDDDDDDDEEEEEEEE (5,5,9,9,9, 10 ways)
...
1110110101PPPPBBBBBBBBCCCCCCCCDDDDDDDDTTTT
1110110110AAAAAAAAQQQQRRRRDDDDDDDDEEEEEEEE
...
1110111000AAAAAAAAQQQQCCCCCCCCDDDDDDDDTTTT
1110111001AAAAAAAABBBBBBBBRRRRSSSSEEEEEEEE
1110111010AAAAAAAABBBBBBBBRRRRDDDDDDDDTTTT
1110111011AAAAAAAABBBBBBBBCCCCCCCCSSSSTTTT
1110111100PPPPJJJJJJKKKKKKDDDDDDDDEEEEEEEE (5,7,7,9,9, 30 ways)
...
1110111110PPPPJJJJJJCCCCCCCCDDDDDDDDMMMMMM
1110111111PPPPBBBBBBBBKKKKKKLLLLLLEEEEEEEE
...
1111000001PPPPBBBBBBBBCCCCCCCCLLLLLLMMMMMM
1111000010IIIIIIQQQQKKKKKKDDDDDDDDEEEEEEEE
...
1111000100IIIIIIQQQQCCCCCCCCDDDDDDDDMMMMMM
1111000101AAAAAAAAQQQQKKKKKKLLLLLLEEEEEEEE
...
1111000111AAAAAAAAQQQQCCCCCCCCLLLLLLMMMMMM
1111001000IIIIIIJJJJJJRRRRDDDDDDDDEEEEEEEE
1111001001IIIIIIBBBBBBBBRRRRLLLLLLEEEEEEEE
1111001010IIIIIIBBBBBBBBRRRRDDDDDDDDMMMMMM
1111001011AAAAAAAAJJJJJJRRRRLLLLLLEEEEEEEE
1111001100AAAAAAAAJJJJJJRRRRDDDDDDDDMMMMMM
1111001101AAAAAAAABBBBBBBBRRRRLLLLLLMMMMMM
1111001110IIIIIIJJJJJJCCCCCCCCSSSSEEEEEEEE
1111001111IIIIIIBBBBBBBBKKKKKKSSSSEEEEEEEE
1111010000IIIIIIBBBBBBBBCCCCCCCCSSSSMMMMMM
1111010001AAAAAAAAJJJJJJKKKKKKSSSSEEEEEEEE
1111010010AAAAAAAAJJJJJJCCCCCCCCSSSSMMMMMM
1111010011AAAAAAAABBBBBBBBKKKKKKSSSSMMMMMM
1111010100IIIIIIJJJJJJCCCCCCCCDDDDDDDDTTTT
...
1111010110IIIIIIBBBBBBBBCCCCCCCCLLLLLLTTTT
1111010111AAAAAAAAJJJJJJKKKKKKDDDDDDDDTTTT
1111011000AAAAAAAAJJJJJJCCCCCCCCLLLLLLTTTT
1111011001AAAAAAAABBBBBBBBKKKKKKLLLLLLTTTT
1111011010IIIIIIJJJJJJKKKKKKLLLLLLEEEEEEEE (7,7,7,7,9, 5 ways)
...
1111011110AAAAAAAAJJJJJJKKKKKKLLLLLLMMMMMM
11110111110VJJJJJJCCCCCCCCDDDDDDDDEEEEEEEE (2,7,9,9,9, 20 ways)
...
11111000001VBBBBBBBBCCCCCCCCDDDDDDDDMMMMMM
11111000010IIIIIIWCCCCCCCCDDDDDDDDEEEEEEEE
11111000011AAAAAAAAWKKKKKKDDDDDDDDEEEEEEEE
...
11111000101AAAAAAAAWCCCCCCCCDDDDDDDDMMMMMM
11111000110IIIIIIBBBBBBBBXDDDDDDDDEEEEEEEE
11111000111AAAAAAAAJJJJJJXDDDDDDDDEEEEEEEE
11111001000AAAAAAAABBBBBBBBXLLLLLLEEEEEEEE
11111001001AAAAAAAABBBBBBBBXDDDDDDDDMMMMMM
11111001010IIIIIIBBBBBBBBCCCCCCCCYEEEEEEEE
...
11111001100AAAAAAAABBBBBBBBKKKKKKYEEEEEEEE
11111001101AAAAAAAABBBBBBBBCCCCCCCCYMMMMMM
11111001110IIIIIIBBBBBBBBCCCCCCCCDDDDDDDDZ
...
11111010001AAAAAAAABBBBBBBBCCCCCCCCLLLLLLZ
111110100100PPPPQQQQKKKKKKDDDDDDDDEEEEEEEE (5,5,7,9,9, 30 ways)
...
111110100110PPPPQQQQCCCCCCCCDDDDDDDDMMMMMM
111110100111PPPPJJJJJJRRRRDDDDDDDDEEEEEEEE
111110101000PPPPBBBBBBBBRRRRLLLLLLEEEEEEEE
111110101001PPPPBBBBBBBBRRRRDDDDDDDDMMMMMM
111110101010PPPPJJJJJJCCCCCCCCSSSSEEEEEEEE
111110101011PPPPBBBBBBBBKKKKKKSSSSEEEEEEEE
111110101100PPPPBBBBBBBBCCCCCCCCSSSSMMMMMM
111110101101PPPPJJJJJJCCCCCCCCDDDDDDDDTTTT
...
111110101111PPPPBBBBBBBBCCCCCCCCLLLLLLTTTT
111110110000IIIIIIQQQQRRRRDDDDDDDDEEEEEEEE
111110110001AAAAAAAAQQQQRRRRLLLLLLEEEEEEEE
111110110010AAAAAAAAQQQQRRRRDDDDDDDDMMMMMM
111110110011IIIIIIQQQQCCCCCCCCSSSSEEEEEEEE
111110110100AAAAAAAAQQQQKKKKKKSSSSEEEEEEEE
111110110101AAAAAAAAQQQQCCCCCCCCSSSSMMMMMM
111110110110IIIIIIQQQQCCCCCCCCDDDDDDDDTTTT
111110110111AAAAAAAAQQQQKKKKKKDDDDDDDDTTTT
111110111000AAAAAAAAQQQQCCCCCCCCLLLLLLTTTT
111110111001IIIIIIBBBBBBBBRRRRSSSSEEEEEEEE
111110111010AAAAAAAAJJJJJJRRRRSSSSEEEEEEEE
111110111011AAAAAAAABBBBBBBBRRRRSSSSMMMMMM
111110111100IIIIIIBBBBBBBBRRRRDDDDDDDDTTTT
111110111101AAAAAAAAJJJJJJRRRRDDDDDDDDTTTT
111110111110AAAAAAAABBBBBBBBRRRRLLLLLLTTTT
111110111111IIIIIIBBBBBBBBCCCCCCCCSSSSTTTT
...
111111000001AAAAAAAABBBBBBBBKKKKKKSSSSTTTT
111111000010PPPPJJJJJJKKKKKKLLLLLLEEEEEEEE (5,7,7,7,9, 20 ways)
...
111111000101PPPPBBBBBBBBKKKKKKLLLLLLMMMMMM
111111000110IIIIIIQQQQKKKKKKLLLLLLEEEEEEEE
...
111111001000IIIIIIQQQQCCCCCCCCLLLLLLMMMMMM
111111001001AAAAAAAAQQQQKKKKKKLLLLLLMMMMMM
111111001010IIIIIIJJJJJJRRRRLLLLLLEEEEEEEE
111111001011IIIIIIJJJJJJRRRRDDDDDDDDMMMMMM
111111001100IIIIIIBBBBBBBBRRRRLLLLLLMMMMMM
111111001101AAAAAAAAJJJJJJRRRRLLLLLLMMMMMM
111111001110IIIIIIJJJJJJKKKKKKSSSSEEEEEEEE
111111001111IIIIIIJJJJJJCCCCCCCCSSSSMMMMMM
...
111111010001AAAAAAAAJJJJJJKKKKKKSSSSMMMMMM
111111010010IIIIIIJJJJJJKKKKKKDDDDDDDDTTTT
...
111111010110AAAAAAAAJJJJJJKKKKKKLLLLLLTTTT
111111010111IIIIIIJJJJJJKKKKKKLLLLLLMMMMMM (7,7,7,7,7)
1111110110000VQQQQCCCCCCCCDDDDDDDDEEEEEEEE (2,5,9,9,9, 20 ways)
...
1111110110011VBBBBBBBBCCCCCCCCDDDDDDDDTTTT

with the coding condensed by omitting intermediate stages in which the last group of two letters representing a smaller number of bits moves through the coding from left to right, or the first group of two letters representing a larger number of bits moves through the coding from right to left.

Such a coding seems, however, to be even more long and complicated than the first example.

Could a simplification still be derived, however, from replacing the single letter, and the first group of three letters, by two groups of two letters? This even suggests attempting to code a given number of bits directly into a group of five letters. If an additional level of hierarchy is introduced into the scheme, it may be possible to keep the list of possibilities at any one level of the hierarchy within limits.

Although the same capital letter may be used to represent one method of encoding bits into two letters, and one method of encoding bits into three letters, the same letters are used as the starting points of groups, so a possibility for two letters might be the same a possibility for three letters in the first position, but not in the second position.

The codings of two letters to 9, 7, 5, and 2 bits respectively, are represented by AAAAAAAA, IIIIII, PPPP, and V; the codings of three letters to 14, 10, 7, and 5 bits (the possibility of representing three bits in 3 letters can still safely be left as unused, because this now affects only two parts of the coding, and it was possible to leave it unused in three parts before) are represented by BBBBBBBBBB, GGGGGG, QQQ, and Y respectively.

Thus, 23 bits would be encoded into five letters by the scheme denoted as:

AAAAAAAABBBBBBBBBB

One can also represent 21 bits by the scheme:

IIIIIIBBBBBBBBBB

and then one can represent 20 bits by the scheme:

0PPPPBBBBBBBBBB
1AAAAAAAAGGGGGG

Eighteen bits are represented by:

0IIIIIIGGGGGG
10VBBBBBBBBBB
11AAAAAAAAQQQ

Sixteen bits are represented by:

0PPPPGGGGGG 10IIIIIIQQQ 11AAAAAAAAY

and thirteen bits are represented by:

0PPPPQQQ
1IIIIIIY

Let us represent the codings of 23, 21, 20, 18, 16 and 13 bits to five letters by aaaaaaaaaaa and bbbbbbbbbbb, fffffffff and ggggggggg, iiiiiiii and jjjjjjjj, pppppp and qqqqqq, uuuu and vvvv, and x and y, respectively.

Then, one can represent 47 bits by two five-letter groups by the following coding:

0aaaaaaaaaaabbbbbbbbbbb
100fffffffffbbbbbbbbbbb
101aaaaaaaaaaaggggggggg
1100iiiiiiiibbbbbbbbbbb
1101aaaaaaaaaaajjjjjjjj
11100fffffffffggggggggg
111010ppppppbbbbbbbbbbb
111011aaaaaaaaaaaqqqqqq
111100iiiiiiiiggggggggg
111101fffffffffjjjjjjjj
1111100iiiiiiiijjjjjjjj
11111010uuuubbbbbbbbbbb
11111011aaaaaaaaaaavvvv
11111100ppppppggggggggg
11111101fffffffffqqqqqq
111111100ppppppjjjjjjjj
111111101iiiiiiiiqqqqqq
1111111100uuuuggggggggg
1111111101fffffffffvvvv
11111111100xbbbbbbbbbbb
11111111101aaaaaaaaaaay
11111111110uuuujjjjjjjj
11111111111iiiiiiiivvvv

and thus, some simplicity is gained by having an additional level of the hierarchy, and, as a bonus, it becomes possible to convert binary messages of some lengths to output having an odd number of five-letter groups.


[Next] [Up/Previous] [Index]

Next
Table of Contents
Home Page