[Next] [Up] [Previous] [Index]

The idea behind Quadibloc 2002D is to have a cipher with the same kind of indirection as was used in Quadibloc 2002, but without the use of unbalanced Feistel rounds. An unbalanced Feistel round which is target-heavy potentially makes differential cryptanalysis easier, because the source is effectively repeated in the f-function output. An unbalanced Feistel round which is source-heavy slows diffusion. These problems are not necessarily all that serious in ciphers which are as elaborate as those in the Quadibloc series, but it would still be desirable to avoid them.

Using components from other Quadibloc ciphers, it is easy enough to design a combiner, following the same principle as the one in Quadibloc 2000, but operating on a 64-bit value instead of a 32-bit value: This combiner takes a 128-bit input. How might a 128-bit value be derived from the other 64-bit half of the 128-bit block? One obvious way would be to use the intermediate results from two conventional Quadibloc rounds, as shown at right:

And then, the remaining question is, what f-function do we use to modify the 128-bit value obtained from the left half before using it to alter the right half?

Since it is desirable to use something that is different from the operations used with the left and right halves of the block, it seems to me that a reasonable choice would be to use the operations from Quadibloc 2002A; a diffusion phase, a first key phase, a nonlinearity phase, a second key phase, and an extra diffusion phase. S5 and S6 were used instead of S1 and S2 in the diagram for the conventional Quadibloc rounds for the left half in order to allow the reciprocal S-box derived from S1 used with Quadibloc 2002A to be used here, and S7 was used instead of S8 for the key-dependent S-box so that the two key-dependent S-boxes used in the diffusion phase could continue to be called S8 and S9. Note that the 128-bit keys used in the diagram, intended to show this operation as used as an f-function in the first round, are labelled LK3 and LK4. LK1 and LK2 are intended to be used with this same operation, but applied to the block as a whole, prior to the first regular round of Quadibloc 2002D. In this way, the original plaintext block will be divided between the left and right halves of the Feistel structure of the main part of the cipher, thereby, it is hoped, complicating analysis somewhat. A similar one-round version of Quadibloc 2002A will, of course, also be applied to the block on exit from the cipher.

After each regular round, except the last, the halves of the block are to be swapped. As the rounds of this cipher are complicated and slow, eight regular rounds, which should be enough to be adequate, are the normal number of rounds for Quadibloc 2002D. Sixteen rounds would, however, be ideal.

### The Rounds in Detail

On entry and exit from the cipher, and as an f-function within its regular rounds, an operation equivalent to one-round Quadibloc 2002A is performed on a 128-bit input giving a 128-bit output.

This operation consists of a diffusion phase, a first key phase, a nonlinearity phase, a second key phase, and another diffusion phase.

The diffusion phase proceeds as follows:

The block is divided into pairs of bytes. In each pair, first the byte on the left has the element of S8 the byte on the right points to added to it modulo 256, and then the byte on the right has the element of S8 the byte on the left points to added to it modulo 256. In the absence of S8, this would essentially be a pseudo-Hadamard transform, as used in SAFER.

The bytes are next considered as belonging to groups of four, and the middle two bytes of each such group are swapped.

Then, considered as pairs of bytes again, first the byte on the right has the element of S8 the byte on the left points to subtracted from it modulo 256, and then the byte on the left has the element of S8 the byte on the right points to subtracted from it modulo 256. In the absence of S9, this would essentially be the inverse of the pseudo-Hadamard transform.

Since at this point, we have four groups of four bytes, each thoroughly mixed within itself, a byte transpose is performed, transposing the sixteen bytes of the block from the order

``` 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
```

to the order

``` 1  5  9 13  2  6 10 14  3  7 11 15  4  8 12 16
```

and thus placing one byte from each group of four into each new group of four.

Then the use of S8 in a pseudo-Hadamard transform and the use of S9 in an inverse pseudo-Hadamard transform is repeated.

Each key phase consists of modifying the block by XORing it with a 128-bit subkey.

The nonlinearity phase consists of replacing each byte of the block with its substitute in the S-box S1, where that S-box is created by taking the original S-box S1, as defined in the page entitled Euler's Constant and the Quadibloc S-boxes; the derived S-box is listed in the page describing Quadibloc 2002A.

The cipher consists of one such manipulation, using the first two long subkeys, followed by eight or sixteen regular rounds, each one using two more long subkeys and six 32-bit subkeys, with the halves of the block swapped after each regular round but the last, ending in another of the manipulations described above, using the last two long subkeys, LK19 and LK20 in the eight-round version, and LK35 and LK36 in the sixteen-round version.

In a regular round, the first operation that is performed is the encipherment of the left half of the block by means of two Feistel rounds resembling those of standard Quadibloc. Specifically, in the case of this cipher:

The left half of the block is divided into a left subblock of 32 bits and a right subblock of 32 bits. In the first of the two rounds, the left subblock is the input to the f-function, the first three short subkeys for the round are used, which are K1 through K3 for the first round, K7 through K9 for the second round, and so on, and the output of the f-function is applied to the right subblock by means of an XOR, modifying it. In the second of the two rounds, the right subblock is the input to the f-function, the second short subkeys for the round are used, which are K4 through K6 for the first round, increased by six for each successive round, and the output of the f-function is applied to the left subblock by means of an XOR, modifying it.

The f-function proceeds as follows:

The first 32-bit subkey used with the f-function is XORed with the input.

The four bytes of the result are replaced with their substitutes in S-boxes S5, S5, S5, and S6, respectively.

The bits of the result are transposed from the order:

``` 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
```

to the order

``` 1  2 27 28 21 22 15 16  9 10  3  4 29 30 23 24
17 18 11 12  5  6 31 32 25 26 19 20 13 14  7  8
```

The value at this point is the first intermediate result of the f-function.

The second 32-bit subkey used with the f-function is XORed with the result.

The four bytes of the result are replaced with their substitutes in S-boxes S5, S6, S6, and S6, respectively.

The bits of the result are again subjected to the transposition previously described.

The value at this point is the second intermediate result of the f-function.

The third 32-bit subkey used with the f-function is XORed with the result.

The four bytes of the result are replaced with their substitutes in the key-dependent S-box S7.

The result is now the output from the f-function.

The intermediate results of the f-functions in the first and second Feistel rounds applied to the left half of the block form a 128-bit value by being taken in the following order, from left to right:

• The first intermediate result of the f-function in the first round;
• The second intermediate result of the f-function in the second round;
• The first intermediate result of the f-function in the second round;
• The second intermediate result of the f-function in the first round.

This 128-bit value is used as the input to the one-round Quadibloc 2002A operation described above, using the two long keys appropriate to the current round, LK3 and LK4 for the first regular round, increasing by two for each successive round. The output is then used to modify the right half of the block by being used as input to the combiner.

The combiner operates as follows:

The input to the combiner can be considered as consisting of eight sixteen-bit parts, numbered from 1 to 8 from left to right.

The combiner acts on the left half of the block. This 64-bit half of the block can also be considered as being divided into sixteen-bit parts, numbered from 1 to 4 from left to right.

The combiner operation consists of these steps:

• A copy of parts 1 and 2 of the half block is made, and used as the input to an f-function, which proceeds as follows:
• A copy of the copy of part 1 is XORed with part 1 of the combiner input.
• The left byte of the result is used as an index into key-dependent S-box S10, and the right byte of the result is used as an index into key-dependent S-box S11.
• The results found in the two key-dependent S-boxes are added together by modulo-65,536 addition.
• The result of the addition is XORed to the copy of part 2 of the half block, modifying that copy.
• A copy of the copy of part 2 is XORed with part 2 of the combiner input.
• The left byte of the result is used as an index into key-dependent S-box S10, and the right byte of the result is used as an index into key-dependent S-box S11.
• The results found in the two key-dependent S-boxes are added together by modulo-65,536 addition.
• The result of the addition is XORed to the copy of part 1 of the half block, modifying that copy.
• A copy of the copy of part 1 is XORed with part 3 of the combiner input.
• The left byte of the result is used as an index into key-dependent S-box S10, and the right byte of the result is used as an index into key-dependent S-box S11.
• The results found in the two key-dependent S-boxes are added together by modulo-65,536 addition.
• The result of the addition is XORed to the copy of part 2 of the half block, modifying that copy.
• A copy of the copy of part 2 is XORed with part 4 of the combiner input.
• The left byte of the result is used as an index into key-dependent S-box S10, and the right byte of the result is used as an index into key-dependent S-box S11.
• The results found in the two key-dependent S-boxes are added together by modulo-65,536 addition.
• The result of the addition is XORed to the copy of part 1 of the half block, modifying that copy.
• The current value of the copy of parts 1 and 2 of the half block are the output from the f-function within the combiner.
• The combiner output is then applied to parts 3 and 4 of the half block, modifying them, using two small Feistel rounds similar to the four small Feistel rounds used within the f-function, as follows:

• The modified copy of part 1 of the half block is XORed with a copy of part 3 of the half block.
• The left byte of the result is used as an index into key-dependent S-box S10, and the right byte of the result is used as an index into key-dependent S-box S11.
• The results found in the two key-dependent S-boxes are added together by modulo-65,536 addition.
• The result of the addition is XORed to part 4 of the half block, permanently modifying it in the direct path between plaintext and ciphertext.
• The modified copy of part 2 of the half block is XORed with a copy of part 4 of the half block.
• The left byte of the result is used as an index into key-dependent S-box S10, and the right byte of the result is used as an index into key-dependent S-box S11.
• The results found in the two key-dependent S-boxes are added together by modulo-65,536 addition.
• The result of the addition is XORed to part 3 of the half block, permanently modifying it in the direct path between plaintext and ciphertext.
• Then a second Feistel round is performed, this time with parts 3 and 4 of the half block (instead of parts 1 and 2) serving as the inputs to the f-function, parts 5 through 8 (instead of parts 1 through 4) of the combiner input serving as the subkeys for the four Feistel rounds which form the f-function within the combiner, and parts 1 and 2 of the half block being modified (instead of parts 3 and 4). Substituting these parts of the half block and input in the procedure above, and repeating that procedure, will correctly give the second half of the combiner operation. The diagram may also be consulted to resolve any ambiguity.

### The Key Schedule

The key material used by this cipher consists of:

• Forty-eight 32-bit subkeys, K1 through K48 (in the eight-round version; ninety-six 32-bit subkeys, K1 through K96 in the sixteen-round version);
• Twenty 128-bit subkeys, LK1 through LK20 (in the eight-round version; thirty-six 128-bit subkeys, LK1 through LK36 in the sixteen-round version);
• Three key-dependent S-boxes, S7, S8, and S9, containing a permutation of the values 0 to 255;
• Two key-dependent S-boxes, S10 and S11, each containing 256 arbitrary 16-bit values.

The key must be a multiple of four bytes in length, and it must be at least eight bytes long.

Subkey material is to be generated in the following sequence:

• The 32-bit subkeys, K1 through K48 or K1 through K96.
• Key-dependent S-box S7.
• The 128-bit subkeys, LK1 through LK20 or LK1 through LK36.
• Key-dependent S-box S8.
• Key-dependent S-box S10.
• Key-dependent S-box S9.
• Key-dependent S-box S11.

Key-dependent S-boxes S7, S8, and S9 are produced by the method for generating permutations; key-dependent S-boxes S10 and S11 are merely filled directly with subkey material from beginning to end.

The subkey material is prepared from the key in the usual fashion, as initially established for Quadibloc XI, and as modified for the case of long keys for Quadibloc 2002:

### Initialization

Three strings of bytes of different length are produced from the key.

The first string consists of the key, followed by one byte containing the one's complement of the XOR of all the bytes of the key.

The second string consists of the one's complements of the bytes of the key in reverse order, with three bytes appended containing the following three quantities:

• The sum, modulo 255, of the bytes of the key, incremented by one by normal addition. (Thus, this produces a number from 1 to 255.)
• The XOR of all the bytes at odd numbered positions in the key, where the first byte in the key is considered to be byte 1, and odd.
• The one's complement of the XOR of all the bytes at even numbered positions in the key.

The third string consists of alternating bytes, taken from the bytes of the key in reverse order, and then from the bytes of the one's complement of the key, and then that string is followed by the one's complements of the first four bytes of the key.

Thus, if the key is:

``` 128  64  32  16   8   4   2   1   1   2   3   4   5   6   7   8
```

then the strings generated from it are as follows:

```First string:
128  64  32  16   8   4   2   1
1   2   3   4   5   6   7   8
8

Second string:
247 248 249 250 251 252 253 254
254 253 251 247 239 223 191 127
37 170  93

Third string:
8 127   7 191   6 223   5 239
4 247   3 251   2 253   1 254
1 254   2 253   4 252   8 251
16 250  32 249  64 248 128 247
127 191 223 239
```

Given that the length of the key is 4n, the lengths of the three strings are 4n+1, 4n+3, and 8n+4, and hence all three are relatively prime, since both 4n+1 and 4n+3 are odd, and 8n+4 is two times 4n+2.

Two buffers, each containing room for 256 bytes, are filled by generating bytes from the first and second strings by placing them in a nonlinear shift register.

The form of that shift register is shown in the following illustration, showing its precise form for the first string. Five bytes are read from the string at each step. For the first string, they are, as shown in the diagram, the eighth-last, fifth-last, third-last, and second-last bytes and the last byte. For the second string, they are the eighth-last, seventh-last, fourth-last, and second-last bytes, and the last byte. For the third string, they are the twelfth-last, tenth-last, seventh-last, and fourth-last bytes, and the last byte.

Each time the shift register produces a byte, it does so as follows:

• The second byte read is used to select an entry in S-box S3, and the value of this entry is XORed to the first byte. Then the first byte is used to select an entry in S-box S3, and the value of this entry is XORed to the second byte.
• The fourth byte read is used to select an entry in S-box S3, and the value of this entry is XORed to the third byte. Then the third byte is used to select an entry in S-box S3, and the value of this entry is XORed to the fourth byte.
• The first and second bytes, as modified, are added together using modulo-256 addition to form a first result.
• The third and fourth bytes, as modified, are added together using modulo 256 addition to form a second result.
• The second result is used to select an entry in S-box S3, and the value of this entry is XORed to the first result. Then the first result is used to select an entry in S-box S3, and the value of this entry is XORed to the second result.
• The two results as modified are added together using modulo 256 addition to form a third result.
• The fifth byte read, which is always the last byte in the shift register, is XORed with the third result. The resulting value is output as the byte produced by operating the shift register.
• The values in the shift register are modified by removing the last byte, advancing the bytes in the shift register to the next later position, and appending the output result as the new first byte of the shift register contents.

Both buffers contain 256 bytes. The first buffer, called buffer A, is filled with 256 successive bytes generated from the second string by means of the nonlinear shift register filled with the second string. The second buffer, called buffer B, is filled with 256 successive bytes generated from the first string by means of the nonlinear shift register filled with the first string.

### Subkey Byte Generation

Once the setup is complete, subkey material is generated one byte at a time, the first byte generated being the leftmost byte of subkey K1, and so on.

A subkey byte is generated as follows:

• A byte is generated from the first string by the nonlinear shift register operation.
• The byte at the position in buffer A indicated by this value is taken, and called P.
• A byte is generated from the third string by the nonlinear shift register operation. In the case of the third string, the shift register operation involves reading the following five bytes: the twelfth, tenth, seventh, and fourth last bytes, and the last byte. The value of the byte thus produced is placed in buffer A, replacing the value taken.
• A byte is generated from the second string by the nonlinear shift register operation.
• The byte at the position in buffer B indicated by this value is taken, and called Q.
• A byte is generated from the third string by the nonlinear shift register operation. Its value is placed in buffer B, replacing the value taken.
• The subkey byte generated is the XOR of P and Q and an additional byte generated from the third string by the nonlinear shift register operation. (The use of an additional byte from the third string ensures that all the bytes of the key particpate directly in the key schedule; otherwise, some could be skipped over in a sense by the selection of bytes to use from the buffers.)

The following diagram attempts to illustrate the complete process of subkey byte generation: Note that this procedure, since it exercises the two strings used to select bytes, rather than the string used to generate values, results in a small change in the key resulting in large changes in the subkeys from the very beginning.

### Permutation Generation

To generate bijective S-boxes, which in the case of this cipher are the S-boxes S7, S8, and S9, the following procedure is used:

• 256 bytes are generated following the same procedure as for subkey byte generation, and these bytes are placed in a 256-byte buffer called buffer C.
• A 256-byte buffer called buffer D is filled with the numbers from 0 to 255 in order.
• For every i from 0 to 255, if element i of buffer C (hereinafter called C(i)) is not equal to i, swap elements i and C(i) of buffer D.
• For every i from 0 to 255, if B(i) is not equal to i, swap elements i and B(i) of buffer D.
• For every i from 0 to 255, if A(i) is not equal to i, swap elements i and A(i) of buffer D.

The resulting contents of buffer D are used as the key-dependent bijective S-box intended to be produced. Note that this is a procedure, introduced for Quadibloc X, is more straightforwards than the other two basic procedures used previously to produce S8 in other ciphers in this series.

This procedure, although it uses buffers A and B, leaves them undisturbed; thus, byte generation may continue after one S-box is produced.

### Long Keys

The subkey generation procedure given above works well for keys from 64 bits to 1,024 bits in length. When the length of the key approaches 2,048 bits in length, however, the first two shift registers are no longer thoroughly mixed by filling the buffers initially.

Thus, the following modified subkey generation procedure is given for keys which are more than 1,024 bits long. A key must still be a multiple of 32 bits in length.

• (1) Divide the key into blocks of 512 bits until the remaining part of the key is from 512 to 992 bits in length.
• (2) Use the first block of the key as the key for the normal Quadibloc 2002 key schedule.
• (3) Continue to generate enough additional subkey bytes to XOR these with the next block into which the key was divided.
• (4) Use the next block, after the XOR, to carry out the Quadibloc 2002 key schedule except:
• The subkey material generated is XORed with the previously generated values of the subkeys;
• (This has been changed for Quadibloc 2002D) For every second block, generate the 128-bit keys first, where the 32-bit keys were initially generated, and generate the 32-bit keys later, where the 128-bit keys were initially generated.
• (This has been changed for Quadibloc 2002D) Generate permutations for S7, S8 and S9 in the normal sequence of generation, and then modify the S7, S8 and S9 to be used in the cipher as follows: go through the newly-generated S7, S8 or S9 and the previously calculated S7, S8 or S9 from positions 0 to 255; in a new array, place the value in the new permutation in the position indicated by the value in the original permutation. Once this is finished, the new array is the replacement permutation.
• (5) If there are any additional blocks remaining in the key, go back to step 3.

This method ensures that the key schedule is a function of the entire key.

### Alternative Subkey Byte Generation Procedure

While the method for subkey byte generation introduced for Quadibloc XI will normally produce a key schedule of high quality, there is one potential for difficulties which remains in it. The basis of subkey byte generation is three nonlinear shift registers. They are designed so that they have an incompressible state space; that is, since a nonlinear function of the block is XORed with a byte of the block, stepping the shift register is invertible, so each shift register state has exactly one predecessor as well as exactly one successor. This prevents a large proportion of states from eventually degenerating into a short cycle, but it does not prevent the existence of short cycles; there is always the risk that the initial state of the shift register might be part of a short cycle.

In this subkey byte generation method, two of the shift registers are used for selecting elements from 256-byte buffers, and only one shift register, the longest, produces the actual values which are placed in those buffers, and which are XORed together to produce the output subkey bytes.

Since producing a key schedule requires only a limited number of bytes, even if that number is in the thousands, short cycles are not a major risk. However, if bytes from the other shift registers also entered into the direct production of output subkey bytes, the robustness of the subkey byte generation process would be improved. In consequence, the alternate subkey byte generation procedure illustrated below is proposed: Here, there are four buffers. In addition to the original buffers A and B, there are two additional buffers, C and D, which precede them. Initially, the 256 elements of buffer C, followed by the 256 elements of buffer A, are filled with bytes generated from the second string by its associated nonlinear shift register operation, and the 256 elements of buffer D, followed by the 256 elements of buffer B, are filled with bytes generated from the first string by its associated nonlinear shift register operation. These operations, and the initial values of the three strings, remain the same as in the original key schedule.

The positions of the buffers A through D in the diagram above are as follows:

```C A
D B
```

The generation of subkey bytes by this improved procedure proceeds as follows:

• Three bytes are generated from the first string by its associated nonlinear shift register operation, and they will be called U, V, and W in order.
• Three bytes are generated from the second string by its associated nonlinear shift register operation, and they will be called F, G, and H in order.
• The byte at the position in buffer A indicated by the value U is taken, and called P.
• The byte at the position in buffer C indicated by the value V is taken, and it is XORed with the value H, and the result is placed in buffer A, at the position indicated by the value U, replacing the value taken from the buffer A.
• The byte at the position in buffer B indicated by the value F is taken, and called Q.
• The byte at the position in buffer D indicated by the value G is taken, and it is XORed with the value W, and the result is placed in buffer B, at the position indicated by the value F, replacing the value taken from the buffer B.
• A byte is generated from the third string by its associated nonlinear shift register operation. The value of the byte thus produced is placed in buffer C, replacing the value taken from that buffer, by being placed at the location indicated by the value V.
• An additional byte is generated from the third string by the nonlinear shift register operation. Its value is placed in buffer D, replacing the value taken from that buffer, by being placed at the location indicated by the value G.
• The subkey byte generated is the XOR of P and Q and an additional byte generated from the third string by the nonlinear shift register operation.

Thus, basically, in this improved subkey byte generation procedure, all three nonlinear shift registers are called upon to generate three bytes. For the first two shift registers, two of these bytes control two stages of shuffling of the bytes recieved from the third shift register, and a third is applied between those two stages of shuffling to the bytes shuffled by the other one of those two shift registers. The three bytes from the third shift register are the starting inputs to the two sets of shuffling stages, and a byte which is directly XORed to the XOR of the final outputs from the two sets of shuffling stages.

### Decipherment

The Quadibloc 2002A rounds at the beginning and end of the cipher are inverted in the manner set forth for Quadibloc 2002A, by reversing the order of the subkeys and interchanging key-dependent S-boxes S8 and S9. Of course, no changes are made to the f-function in the regular round.

The basic principle of leaving f-functions unchanged, and performing Feistel rounds in reverse order, is used to invert the Quadibloc rounds operating on the left half of the block, and the combiner operating on the right half of the block, to form the inverse of a regular round.