The length of a message is itself an item of information about that message. And it is one form of information about a message that is usually not concealed in the course of conventional cipher schemes.
The most effective way to conceal not only the lengths of messages, but other information that can be used for traffic analysis, such as the addressee to whom each messages is sent, and the number of messages that are being sent, is simply to use a continuous channel which is completely encrypted. Perhaps a small portion of that channel is dedicated to header information which all users on the channel can decrypt to detect if there is a message for themselves; the rest of the channel could be filled with random bits, that do not need to be encrypted, when not in use.
An illustration of this principle is given here, but in the context of the extremely demanding situation of the use of one-time pads for encryption, where one wishes to encrypt no more text than is absolutely necessary. In practice, applying some form of continuous stream cipher encryption at the network layer is not a problem, and discussions about the roles of end-to-end encryption versus link encryption can be found in the literature on network security.
When messages are sent separately, though, can their lengths still be concealed? Because there are twice as many messages of length N+1 as there are of length N, but it is usually not the case that a message of length N+1 is twice as likely as a message of length N, many simple schemes to do this, as we shall see, have serious pitfalls, and can either leak information about plaintexts in the length of the input they provide to the next stage, or create plaintexts with long strings of identical bits.
Also, it has been noted that compressing a message changes the length of a message, in a way that depends on its content. Does that mean that compressing messages before encrypting them is dangerous, because it reveals more about a message than merely its length?
Actually, that is not the case, because knowing the length of a compressed message only does not tell you whether it is a longer message that compressed well, or a shorter message that compressed poorly. But this does underscore one fact that might be overlooked: if compressed messages are being sent using encryption, it is important to ensure that the system used does not leak information about how long the messages were when not compressed. Then, indeed, information about the content of the message is being leaked.
Note that since the length of the message in its final encrypted form is apparent from the message itself, if it is convenient to prefix the message with a field indicating the length of the message, to facilitate decoding the message, this field should not be encrypted. However, it should be authenticated, at least in many applications, and that means one has to use real authentication, instead of relying upon one's encryption to ensure that anything that makes sense when decrypted is authentic.
Again, though, the length of a message is information about a message. Can we conceal this information, and if so, to what extent and how?
One way is to add random padding to messages. Thus, a message might begin with an eight-bit field indicating that from 0 to 255 bits of random padding are added; then, the random padding might immediately follow, or be appended to the end of the message.
This has the limitation that it can only make messages longer. Furthermore, a simple padding scheme like that has the result that if the plaintext message was N bits long, the resulting messages have lengths uniformly distributed between N+8 and N+264 bits in length, so this kind of scheme, by itself, will not conceal the activities of a station that sends out the message "Nothing to report" every night.
Some degree of waste may not be a bad thing. If messages after compression might be as short as one bit in length (let us assume that the null message is transformed to itself) but the encryption method used requires that messages be at least 64 bits in length, then if we transform all messages from 1 to 63 bits in length to a 64 bit message (which is possible, as there are (2^64)-2 of them), we have achieved something useful. We could continue by transforming (nearly?) all 64-bit messages to 65-bit messages, half of the 65-bit messages to 66-bit messages, one quarter of the 66-bit messages to 67-bit messages, and so on, to make the needed room.
How does this work? Let us illustrate what is going on with a small table, showing what happens when we enforce a minimum message length of four bits.
If we begin with messages of four or fewer bits in length, as there are 14 different messages from 1 to 3 bits long, most of the four bit outputs are used to encode them. To avoid complexity, we will not try to make use of the two four-bit messages 0000 and 0001.
0 0010 0000 00000 1 0011 0001 00001 0010 00010 00 0100 0011 00011 01 0101 0100 00100 10 0110 0101 00101 11 0111 0110 00110 0111 00111 000 1000 1000 01000 001 1001 1001 01001 010 1010 1010 01010 011 1011 1011 01011 100 1100 1100 01100 101 1101 1101 01101 110 1110 1110 01110 111 1111 1111 01111
Since the four-bit outputs are used up, we have to use five-bit outputs to represent the four-bit messages. But only half of them are needed for that purpose.
We could stop at this point by just adding one bit of random padding to messages four bits long or more. But if we do not feel we need to use random padding to make the leading bits of our messages more uniform, to get a minimal-length representation of messages, we can say that any five bit output message that begins with a 1 represents itself. Then, five bit messages starting with a zero must have an additional zero prefixed to them.
So only sixteen of the possible 32 five-bit messages become expanded to six-bit messages; and thus, only sixteen of the possible 64 six-bit messages must also become expanded by one bit to become seven-bit messages.
This continues on like this:
n, nn, nnn ----> qqqq nnnn ---------> 0nnnn 0nnnn ------- 1nnnn -------|-> 1nnnn | 00nnnn ---- -> 00nnnn 01nnnn ----|---> 01nnnn 10nnnn ----|---> 10nnnn 11nnnn ----|---> 11nnnn | 000nnnn - ---> 000nnnn 001nnnn -|-----> 001nnnn 010nnnn -|-----> 010nnnn 011nnnn -|-----> 011nnnn 100nnnn -|-----> 100nnnn 101nnnn -|-----> 101nnnn 110nnnn -|-----> 110nnnn 111nnnn -|-----> 111nnnn | -----> 0000nnnn
Instead of going on forever, we could treat the first 128 bits of the plaintext as a 128-bit message, adding a bit to it if necessary, and then leaving the rest of the message out of consideration.
It might be noted also, at this point, that while the plaintext message encrypted with DES in CBC mode must be at least 64 bits long, the ciphertext message is at least 128 bits long, because of the 64-bit initialization vector. If a 64-bit initialization vector is enough to ensure uniqueness, one way to conserve bandwidth would be to encrypt a message first by using DES in CBC mode, and then by a cipher with a 128-bit block, such as AES, in ECB mode.
If messages shorter than 64 bits are common, though, then just using padding in this way is dangerous, since it would be likely that a 64-bit message included a long string of zeroes or ones. To pad with random bits, instead of a 1 followed by zeroes, for example, one would need a length-indication field, in the manner illustrated here for making messages come out to a whole number of bytes.
Specifically, one convention that might be used is this: all messages are padded to at least 70 bits in length. The plaintext for a 70-bit ciphertext message is composed of the following three fields:
and where a ciphertext message is longer than 70 bits, it is composed of six bits of random padding, followed by the original plaintext.
Actually, we can get rid of most of the six bits of random padding eventually by the following scheme:
This doesn't conceal the length of the message, except when that length is 64 bits or less, but what it does do is allow a minimum length to be enforced without making the plaintext input to encryption overly regular. Note that we needed to wind up with a minimum length of 70 bits because 70 is one of the numbers in the form n + (2^n), in this case for n=6:
n n + 2^n 2 6 3 11 4 20 5 37 6 70 7 135 8 264
One can use random padding in fractions of a bit to get more flexibility. For example, 64 is one more than 63, which is 3 times 21. So if having just one impossible combination of 6 bits is not intolerable, one could take 1 to 21 bits of original plaintext, with 20 to 0 bits of random padding, with a 6-bit field containing, at random, either the length of the original message text minus one, or the length of the original message text plus 20, or the length of the original message text plus 41, at random. This imposes a minimum length of 27 bits.
Applying this to the last block of a message could allow one to force a message to be composed of an integer number of 21-bit blocks (not 27-bit blocks) as well, with the same degree of uniformity. Since the amount of plaintext contained in 27 bits varies from 1 to 21 bits, and the amount of plaintext to be enciphered can have any value, the plaintext before the last 27-bit block can only increase in increments of 21 bits. The input plaintext would have to be guaranteed to be at least 16 bits in length, so that the output would be two 21-bit blocks, consisting of an initial 15 bits of plaintext, followed by the ending 27-bit block which must contain at least 1 bit of plaintext.
Either that is assumed to be true, or one could enforce a minimum 20-bit length by first applying the case with a four-bit length field or four random bits of the original scheme to the beginning of the message.
For example:
Initial message: 10110010101 011 Enforcing minimum 20-bit length: 110110110010101 01110 Padding to a multiple of 21 bits: 110110110010101 000100011100101101101
So this is the technique we might use to force a message to be composed of an integer number of 47-bit blocks, if we wanted to convert it to groups of ten letters using the coding outlined here. 47 times 5 is 235, which might be considered acceptably close to 256, although there are still several unused codes, so the final 55 bits would contain from 1 to 47 bits of plaintext. The plaintext would have to have a minimum length of 30 bits.
Whether the eight-bit length field comes just before the last 47 bits of the message, or at the very end of the message, or at the beginning of the last 84 bits of the message, is, of course, subject to one's individual preference.
Can the length of a message be encrypted, instead of merely camouflaging it by adding padding, which seems inherently wasteful?
Of course, one can certainly encode messages to messages of other lengths; data compression is proof of that. So one could encode the messages three bits of length or less by a code such as the following:
000 010 0 101 001 11 1 011 010 0 011 100 00 1 100 00 01 110 101 111 10 01 110 000 11 001 111 10
This kind of code would expand most short messages, and shrink a few long ones. So it would still be wasteful, and it would provide only a limited degree of concealment of the length of a message. Still, the possibility of randomly shrinking the occasional message is attractive.
One simple way to randomize the length of a message would be to encode all or part of it by means of a pair of prefix property codes.
For example:
0 001 10 1 110 000 111 01
This, however, has the bad effect of making the probabilities of the bits non-uniform.
Of course, for ease of encoding, we could make the input code uniform in length:
00 001 01 1 10 000 11 01
but then the idea of making the output code uniform in length suggests itself, as part of a way of producing messages that have lengths that are multiples of some figure:
0 11 10 00 110 01 111 10
Using the principles examined previously under pseudo-Morse encoding, the possible partial combinations of bits that don't complete a symbol could then be encoded as follows:
(null) 00 1 11 11 10
and every message would have an even number of bits.
Since that kind of scheme would alter the lengths of small parts of the message in random directions, however, the net variability in the lengths of outputs produced from inputs of a given length would be small. The main result would be to make nearly all messages longer, by an amount reflecting the inefficiency of the coding. But this still illustrates one of the principles we can use.
At this point, the outlines of one way in which we might process messages for length are becoming clear.
First, we might modify the message so that it is at least 56 bits in length. This could be done as follows:
This simple scheme would not produce the two 56-bit messages 00...000 and 00...001, but that can be presumed to be a small defect.
Second, some small section of the message might be transformed, the length determined by a key, by some method that might shorten it in some cases, without introducing too much non-uniformity in the binary bits of the message. How this might be done will be examined below.
Third, the message is prefixed by a random byte from 0 to 255, and then that number of random bits are added to the message. The extra byte, of course, now means the message is at least 64 bits in length.
This outlines what processing a message to conceal its length could involve.
It would be simple enough to produce a length encoding for whole messages. There are (2^n)-2 messages from 1 to n-1 bits in length, so any encipherment method that scrambles an alphabet of (2^n)-2 symbols could be used.
The simplest way would be to transform everything to n bits by prefixing a single 1, preceded by as many zeroes as needed, then using an n-bit block cipher, and then re-encrypting if the inadmissible all zero or 00...001 values are output.
So if after ensuring the message is at least 56 bits long, the message is from 56 to 119 bits in length, we could use a 64-bit block cipher to encrypt the length of everything in the message but the first 55 bits in this way.
This would not involve introducing too much nonuniformity into the bits themselves. But what if the message is longer?
One way to proceed would be that if the message were from 120 to 183 bits in length, we would leave the first 119 bits of the message alone. Again, the last part, from 1 to 63 bits, would be enciphered to something from 1 to 63 bits.
However, this method is seriously flawed. Because half of the possible blocks between 1 and 63 bits become 63 bits long, very few bits are saved compared to simply padding the message to an integer multiple of 64 bits in length. Not only that, by using a simple known coding scheme to produce a 64-bit input and convert a 64-bit output from a block cipher, the visible message length leaks information about the values of bits, essentially turning double encryption into single encryption.
What sort of leak is this?
However many bits are left at the end of the message, they are padded to 64 bits by being preceded by 00...001. This padded result is encrypted by a block cipher, which could be DES, for example.
The output from the DES encipherment is then transformed back to a string from 1 to 63 bits in length by having the leading 00...001 removed from it. (If that output was the all-zero string, or all zeroes followed by only a single 1, it is re-encrypted, as noted.)
After this takes place, the message is enciphered by the "real" cipher in use, such as Rijndael.
But if the message is 120 bits long, then we know the output of the DES encryption must have been of the form:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000001x
If the message is 121 bits long, then we know the output of the DES encryption, before converted to a variable-length string, must have been of the form:
00000000 00000000 00000000 00000000 00000000 00000000 00000000 000001xx
Of course, these cases will be very rare; the distribution of the lengths of messages is going to have a sawtooth-shape, another reason why this method has very little advantage over simply transforming messages to a multiple of 64 bits in length.
But even if the message is 182 bits in length, we still know the output of the DES encryption looked like this:
01xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx xxxxxxxx
and in the most common case, a 183 bit output, we know that the first bit of the DES output was a 1.
This could be dealt with by encrypting the length of the message by this technique after it was encrypted by the main encryption technique used, and then applying only an unimportant encryption to the last 64 bits of the message after encrypting its length; only that encryption would be potentially bypassed by the leak of information.
We could also use the first 56 bits of the message to produce a number from 0 to 63 to indicate a number of bits not to be included in the length encryption. Then the lengths of messages would not indicate anything very much about message content, because each length could be caused in different ways. Since, however, half the time the last 1 to 63 bits of the message would be transformed to 63 bits, the length would strongly indicate which number from 0 to 63 had been derived from the first 56 bits, so a different leak of information, at the other end of the message, is now created.
Padding the message with 0 to 255 random bits, of course, would seem to nicely plug all these leaks of information, but they are so pervasive that attempting to encrypt message length by the technique described above would seem to be a disaster waiting to happen.
An even worse problem with this method of length encryption is that, because we have been hoping to occasionally reduce the length of encrypted messages, we did not add random padding in order to increase the size of our variable-length input to the DES encryption from 1 to 63 bits to 64 bits. Thus, the two inputs 000...00010 and 000...00011 to the DES encryption will occur on average once out of every 126 messages each. The strategies above does not do much to address this problem, and it is the reason that random padding is needed even when a message is converted to a multiple of 8 bits in length.
However, if instead of using a 64-bit block cipher, we just used a smaller substitution to encrypt the last few bits of a message, and we chose at random to encipher strings from 1 to 3 bits up to strings from 1 to 7 bits, for example, the danger would definitely be reduced, and it is even possible some small benefit might be obtained.
The main problem with only encrypting the length by means of random padding is that if the padding happens to be short, and the message is short, it becomes certain that the original message is short. Thus, increasing all messages to some minimum length is perhaps the most useful way in which to conceal message lengths.
Above, we very briefly considered changing the length of a message all through its length by using a pair of prefix-property codings. That would expand the message everywhere, creating much additional redundancy in its bits. So that did seem like it should be discarded. But simply adding from 0 to 255 bits of padding to a message can seem like it provides only a limited amount of length concealment. Is there a valid technique that provides concealment proportional to the message length?
One tolerable amount of wastage throughout the length of a message might be the wastage in not compressing the message. What if one compressed only half the message, and left the other half uncompressed?
If the length of the message is altered, though, it isn't obvious from the length of the ciphertext how many symbols are in the plaintext. So if we indicate in a field how much of the message is compressed, we also have to indicate the length of that field. And since the length of the message is approximately known, the contents of the field reflecting the size of the message can be partially predicted.
This can be avoided by putting the uncompressed part of the message first, and using a special character to terminate it.
For example, a text message might be in 7-bit ASCII, with 1111111 used instead of the carriage return as a paragraph mark, and the symbol 00 indicating the transition from uncompressed text to compressed text. (Or, since there are multiple paragraphs, but one transition, the symbols could be reversed, so that even the uncompressed text is very slightly compressed. Better yet, use 00 to replace the most frequent character, the space!)
But this could be very dangerous. Let us again take the case of a station which sends out "Nothing to report" each evening.
Now, the distribution of the lengths of messages of this station would be such that the possible lengths are, assuming the compression is a simple Huffman code:
7 bits for each letter of "Nothing to report" 7 bits for each letter of "Nothing to repor" plus the length of the Huffman symbol for "t" plus 2 bits overhead 7 bits for each letter of "Nothing to repo" plus the length of the Huffman symbols for "rt" plus 2 bits overhead and so on
and the amount of information leaked about the message is very large; basically, the approximate frequency of every symbol in the message! Instead of the length, now the entire text of the message becomes very easy to reconstruct.
Again, also adding up to 255 bits of random padding might well plug this leak, but this illustrates how dangerous length randomization of discrete messages can be.
Adding up to 255 bits of random padding, with a uniform distribution of lengths, if a truly enormous number of identical messages were sent, however, could be averaged out. While what I will now describe does not constitute something I would recommend, it being more of the nature of a tour de force, the general principles which it illustrates for coping with this leak may have more practical applications elsewhere.
Let us take the message, after compression, but before any kind of encryption is applied. The message is to be restricted to be more than 80 bits in length.
We shall begin by taking a keyed hash of all but the first 64 bits of the message, and XORing that hash to the first 64 bits of the message.
Then we shall encrypt the remainder of the message after the first 64 bits by whatever scheme suits us, and that scheme is to be varied through the use of an initialization vector.
The first 64 bits of the message identify the message, and are constant for any given message. We shall use a method to derive, from these 64 apparently random bits, a coding from 16 bits to numbers from 0 to 255 such that the distribution of those numbers is not uniform; instead, for each particular message, that distribution is unique. Note that this is why the hash must be keyed, so that an adversary cannot precompute the distributions applying to various possible messages.
We then determine, using that coding, applied to the next 16 bits of the encrypted message, following the first 64 bits, a number from 0 to 255, and then append that many random bits to the end of the message.
Finally, we encrypt the message as a whole, including the first 64 bits, again varied with an initialization vector.
The intent is that each message is now concealed with a number of random bits belonging to an unknown distribution, which is persistent for each given message, so the steps in the length of the message due to the varying length of its symbols after compression are not recoverable. Since, however, the same distribution is applied when the message is mostly compressed and when the message is mostly not compressed, this system is still not truly immune to attack. In order to have the amount of random padding depending on where the transition is made, it would be necessary to add the random padding at the plaintext stage, and it would mean that compression would not be allowed to begin during the first 80 bits of the message (and also that small variations in symbol length, such as using 00 to represent space in what is otherwise 7-bit ASCII, could not be allowed in the first part of the message, since the compression transition has to be decoded before the random padding length code is interpreted).
Where the compressed form of the message involves a static single-character Huffman code, another possibility suggests itself. Why not use a stream cipher to generate a series of bits, and represent each character in its uncompressed form for a 0 bit, and in its compressed form for a 1 bit?
This certainly would scramble certain regularities in the plaintext, but it would not vary the length very much. Varying the probability of a 1 bit continuously would essentially require generating a multi-bit stream cipher output value for each character.
But one can combine the two methods as follows: generate two bits for each character from a stream cipher. Initially, only the output value 11 would correspond to a compressed character, but then an uncompressed character having a marker value would switch over to the case where only the output value 00 would correspond to an uncompressed character.
This provides half the amount of variability in length that the original scheme did, but it doesn't leave very much in the way of an exploitable distribution of message lengths behind.
Also, if the only compression is a static coding, applied to individual symbols, the message can be subjected to a transposition cipher prior to this length-varying operation, further reducing the danger that the distribution of lengths of different encipherments of the same message would disclose important information about it.
Given what we have seen above, it would seem that encrypting the length of discrete messages is anything but discreet. However, at the risk of flaunting an attempt to flout the laws of mathematics, it should be noted that securely concealing message lengths is not an impossibility.
The pigeonhole principle, that there are 2^N messages N bits long, is the pitfall of some of the schemes outlined above. Padding a message to a minimum length, particularly if one is using a cipher immune to known and chosen plaintext attacks, is possible. If the lengths of the messages being sent, or at least the lengths of the messages being sent whose lengths one will attempt to conceal, is bounded, that is all one really needs to do.
The simplest way of doing this is, of course, to pad the message with a string of zeroes bounded by a single 1.
If one is not fully confident of the immunity of one's cipher to known plaintext attacks, or at least one does not wish to tempt fate, the next method, as we have seen in previous sections and above on this page, is to have a fixed-length field indicating the number of bits of random padding that are present, followed by that many bits of random padding.
If, however, we are padding messages that are usually about 200 or so bits long to fill a 2,048 bit standard message block, as we might be doing in conjunction with public-key encryption using methods such as RSA, then this method still leaves the first few bits of the field indicating the amount of random padding almost always equal to 1. Of course, known plaintext in that form is hardly likely to be a weakness of something that as thoroughly mixes the entire block as modular exponentiation. Still, since a 2,048 bit message block to conceal length might be used with a weaker cipher, and for completeness, how can we do better than having random padding determined by a length field?
The answer is to use a Huffman code based on the actual distribution of the lengths of messages being sent.
Even a very crude approximation, to illustrate the principle, is an improvement on a simple field giving the number of padding bits. In this case, for ease of understanding, the number of message bits, rather than the number of padding bits, will be encoded, and everything in the standard block after the message bits will be random padding.
Length header code Length Range of lengths covered 000nnnnn N+1 1 ... 32 001nnnnnn N+33 33 ... 96 01nnnnnnnn N+97 97 ... 352 10nnnnnnnnnn N+353 353 ... 1376 110nnnnnnnnnnn N+1377 1377 ... 3424 111nnnnnnnnnnnn N+3424 3425 ... 7520
Here, this code pads all messages, with lengths from 1 bit to 7520 bits, to fit in a standard block which is 7535 bits long. It does so based on the assumption that the median length of a message being enciphered is 352.5 bits.
Perhaps this coding would better illustrate the principle:
Length header code Length Range of lengths covered 00000nnnnn N+1 1 ... 32 00001nnnnn N+33 33 ... 64 0001nnnnn N+65 65 ... 96 001nnnnnn N+97 97 ... 160 01nnnnnnnn N+161 161 ... 416 10nnnnnnnnn N+417 417 ... 928 110nnnnnnnnnn N+929 929 ... 1440 111nnnnnnnnnnn N+1441 1441 ... 2464
This coding uses a bit more variation in probability for shorter message lengths, and uses a smaller standard block. Here, any message with a length from 1 bit to 2464 bits is padded to fit in a standard block which is 2477 bits long. Although the standard block is shorter, the assumed median message length is now 416.5 bits.
Assuming that the distribution of message lengths remains constant, but one is using RSA, and one might wish to vary the size of the modulus used depending on the security one wishes to have, and so the size of the maximum length message one handles in this fashion also varies, one could either simply change the number of bits used to express the length of the message numerically in the last prefix, or one could construct a family of codes.
Thus, the first code in the series, producing a 1,067 bit standard block from messages up to 1,056 bits in length, would be:
Length header code Length Range of lengths covered 00000nnnnn N+1 1 ... 32 00001nnnnn N+33 33 ... 64 0001nnnnn N+65 65 ... 96 001nnnnnn N+97 97 ... 160 01nnnnnnn N+161 161 ... 288 10nnnnnnnn N+289 289 ... 544 11nnnnnnnnn N+545 545 ... 1056
the second code in the series would be the following:
Length header code Length Range of lengths covered 00000nnnnn N+1 1 ... 32 00001nnnnn N+33 33 ... 64 0001nnnnn N+65 65 ... 96 001nnnnnn N+97 97 ... 160 01nnnnnnn N+161 161 ... 288 10nnnnnnnn N+289 289 ... 544 110nnnnnnnnn N+545 545 ... 1056 111nnnnnnnnnn N+1057 1057 ... 2080
producing a 2,093 bit standard block from messages up to 2,080 bits in length, and the third code in the series would be:
Length header code Length Range of lengths covered 00000nnnnn N+1 1 ... 32 00001nnnnn N+33 33 ... 64 0001nnnnn N+65 65 ... 96 001nnnnnn N+97 97 ... 160 01nnnnnnn N+161 161 ... 288 10nnnnnnnn N+289 289 ... 544 110nnnnnnnnn N+545 545 ... 1056 1110nnnnnnnnnn N+1057 1057 ... 2080 1111nnnnnnnnnnn N+2081 2081 ... 4128
in each case, appending a 0 to the last prefix, and following the new prefix by one additional bit of length coding, so that in each case, the code would be (starting with N=10) one that encodes messages of up to 2^N+32 bits in length in a standard block 2^N+2*N-9 bits in length; in this, the last case illustrated, messages of up to 4,128 bits in length are encoded in a standard block which is 4,143 bits long.
The basic estimate of the median length of a message remains constant at 288.5 bits, only the encoding of longer messages is varied.
In each case, if a message longer than the maximum length to be enciphered in the initial block is sent, the simplest way of handling it is to pad it with the number of random bits that represent the length prefix code of the longest messages encoded within the initial block alone, so that the length of the message in that case can be determined by the length of the message being sent. The amount of random padding can be tapered down to one bit using the method outlined earlier on this page, of course, as well.
If one is thinking of a general-use E-mail encryption program, of the general type illustrated by PGP and RIPEM, however, it might be desired not just to provide users with a choice of the size of the modulus to be used, but also with a choice of the median length for the messages to be sent, since some users may tend to send longer messages than others. If this were treated as a prearranged secret key element, then it would be important that the length of the moduli (one bit larger than the size of the standard block, of course) would not indicate the coding chosen. It is possible to construct codings with similar standard block sizes, but a small number of extra padding bits would be required to make the lengths coincident.
Thus, for compatibility with the coding above that provides a 288.5 bit median block length, one could begin with the following code:
Length header code Length Range of lengths covered 00000nnnn N+1 1 ... 16 00001nnnn N+17 17 ... 32 0001nnnnn N+33 33 ... 64 001nnnnn N+65 65 ... 96 01nnnnnn N+97 97 ... 160 10nnnnnnn N+161 161 ... 288 110nnnnnnnn N+289 289 ... 544 111nnnnnnnnn N+545 545 ... 1056
which also operates on plaintext messages from 1 to 1,056 bits in length, but which uses a 12-bit prefix instead of an 11-bit prefix for the longest messages. Here, the median message length is 160.5 bits (after compression).
Similarly, for a longer median message length, we could combine two levels instead of splitting up a level:
Length header code Length Range of lengths covered 00000nnnnn N+1 1 ... 32 00001nnnnnn N+33 33 ... 96 0001nnnnnn N+97 97 ... 160 001nnnnnnn N+161 161 ... 288 01nnnnnnnn N+289 289 ... 544 1nnnnnnnnn N+545 545 ... 1056
changing the median length to 544.5 bits. Here, the standard block starts as 1,066 bits in length.
Thus, if we choose to start with a standard block of 1,068 bits, with either one or two additional bits of random padding as required, our choice of median message length can remain secret, instead of being disclosed in the length of our moduli.