[Next] [Up] [Previous] [Index]

Fallacies of Cryptography and Compression

Every so often, someone notes that there are 254 possible sequences of seven or fewer bits, and concludes from this that nearly every file can be compressed, by a method resembling the following:

0000 0000
0001 0001
0010 0
0011 1
0100 00
0101 01
0110 10
0111 11
1000 000
1001 001
1010 010
1011 011
1100 100
1101 101
1110 110
1111 111

The table in this example is ordered to show what is going on. The 0 and 1 combinations are not compressed; all other combinations are replaced by the bits following the first 1 bit.

If one were to compress each byte of a file in this fashion, the effect would be that of converting a file from 5-level code to Morse code; and no actual compression would take place, because recording the length of each symbol, or the breaks between symbols, would lengthen the file enough to cancel out any gains. At least if the file were composed of random bits: using real Morse code on a real text file, since the most common letters have the shortest symbols, would provide real compression, but no more compression than could be achieved by a Huffman code.

Some people have even had the bright idea of applying this method of compression repeatedly:

0000 0000  000 000  00 00
0001 0001  001 001  01 01
0010 0     010 0    10 0
0011 1     011 1    11 1
0100 00    100 00
0101 01    101 01
0110 10    110 10
0111 11    111 11
1000 000
1001 001
1010 010
1011 011
1100 100
1101 101
1110 110
1111 111

Thus, 1110 compresses to 110, which compresses to 10, which compresses to 0.

Unfortunately, 0 decompresses to 10, 010, or 0010, 10 decompresses to 110 or 0110, 010 decompresses to 1010...

If one records the particular path that compression took, then one has recorded enough information to cause this scheme to no longer involve any compression.

Here is an illustration of a more pervasive fallacy:

Let us use the following square:

  0123456789
0    A  IOSW
1 T   IT A S
2 EEGT TS M
3 MEE HS  BW
4    TR ETSO
5 IS      HS
6 T E TAI  T
7  IO  S   H
8 WFO E T  T
9   WA H W I

to decode the following cipher message:

3.14 15 92 65 35 89 79 32 38 46
   I  T  W  A  S  T  H  E  B  E

  26 43 38 32 79 50 28 84 19 71
   S  T  B  E  H  I  M  E  S  I
         o  f  t

  69 39 93 75 10 58 20 97 49 44
   T  W  A  S  T  H  E  W  O  R

  59 23 07 81 64 06 28 62 08 99
   S  T  O  F  T  I  M  E  S  I

  86 28 03 48 25 34 21 17 06 79
   T  M  A  S  T  H  E  A  I  H
      w                    g  e

  82 14 80 86 51 32 82 30 66 47
   O  I  W  T  S  E  O  M  I  T
      f     i     d

  09 38 44 60 95 50 58 22 31 72
   W  B  R  T  H  I  H  G  E  O
      a  s        e  a

Hmm, those errors just keep accumulating. Perhaps pi isn't really A Tale of Two Cities in code after all.

What if we try a different cipher square? How about...

3.14 15 92 65 35 89 79 32 38 46
   I  N  T  H  E  S  E  C  O  N

  26 43 38 32 79 50 28 84 19 71
   D  C  O  C  E  U  R  Y  O  F
         e  n  t

  69 39 93 75 10 58 20 97 49 44
   T  H  E  C  H  R  I  S  T  I

  59 23 07 81 64 06 28 62 08 99
   A  N  E  R  A  T  H  E  E  M

  86 28 03 48 25 34 21 17 06 79
   P  H  R  E  O  F  R  O  T  E
      i                    m

No, I guess it isn't Gibbons' Decline and Fall of the Roman Empire either.

Of course, I've only scratched the surface of the possibilities. Perhaps I need to apply a fancier encryption technique to the digits of pi before trying a homophonic substitution. If I used three digits, rather than two, at a time, I would get better results (simply because it would take longer before a sequence of three digits would repeat). Perhaps I need to start somewhere else in the decimal expansion of pi, rather than at the beginning.

This can be considered a compression fallacy as well as an encryption fallacy. Since pi contains so many existing examples of random strings of digits, could we take a random string of digits, and compress it by turning it into a set of pointers to where its component sequences occur in pi?

The answer is no, since if we take a random sequence of five digits, on average one would have to go through the first 50,000 digits of pi to find a match for it.

Thus, the description of where something is hidden in pi is, almost always, as long as the thing itself.


[Next] [Up] [Previous] [Index]

Next
Table of Contents
Home Page