[Next] [Up] [Previous] [Index]

# Questions of S-Box Design

As previously noted, the basic principle of differential cryptanalysis is that the cipher being attacked has a characteristic if there exists a constant X such that given many pairs of plaintexts A, B, such that B = A xor X, if a certain statement is true about the key, E(B,k) = E(A,k) xor Y for some constant Y will be true with a probability somewhat above that given by random chance.

### Differentials

A 1991 paper by Xuejia Lai and James Massey presents a somewhat modified definition of differential cryptanalysis which specifies a case in which it may be more easily analyzed, and in which it can be used to obtain more information about the key.

In their formulation, a cipher has an N-round differential if there exist constants X, Y such that given many pairs of plaintexts A, B, such that B = A xor X, for N-rounds of the cipher, EN(B,k) = EN(A,k) xor Y will be true with a probability somewhat above that given by random chance if the key k is randomly chosen.

Remember, in the case of a characteristic, this would only be true if the key k had a certain specific property for which we would be searching: here, a difference of X in the input almost always implies a difference of Y in the output.

A cipher with more than N rounds can then be attacked by means of this differential, because then inputs differing by X will lead to inputs differing by Y into the remaining rounds of the cipher; as long as those remaining rounds have a characteristic for Y as the input, an attack is possible. If the number of rounds of the block cipher is N+1 (or, in the case of most Feistel ciphers, N+2 as well) it is definitely possible, given that the cipher is built upon the iteration of cryptographically weak individual rounds, to extract information about the last subkey(s).

While the probabilities of differentials are calculated based on the simplifying assumption that subkeys are independent, the usefulness of determining the last subkey is, of course, enhanced by the fact that they are not; for example, in DES, the 48 bits of the final subkey are 48 bits of the original 56-bit key. In the case of independent subkeys, though, once the subkey of the last round, round N+1, is known, then if in addition to an N-round differential, there exists an (N-1)-round differential (which would seem likely, as such things are easier to obtain the fewer rounds one must deal with), then one can easily use it to find the subkeys for round N since one is now able to undo round N+1 knowing its subkey, and so on.

If it were not for this fact, a trivial way to make ciphers much more resistant to differential cryptanalysis would be to use a one-way hash as part of the key schedule before calculating the subkeys for the last round or two (and, of course, the first round or two). Thus, splitting a differential characteristic into a key-dependent part and a key-independent part, when possible, provides a means of peeling away the rounds of an iterative block cipher one by one.

### Markov ciphers

This paper also introduced the concept of Markov ciphers.

A Markov cipher does not need to have independent round subkeys. It does have to be composed of a repeated round function, and it has to have the property that, if the subkeys were both independent and random, the probability of a differential would not be affected by the value of the plaintexts involved.

DES, for example, is a Markov cipher, because for any two 64-bit input plaintexts, one can find corresponding subkeys for the first two rounds that will lead to those two input plaintexts not only being enciphered into the same 64-bit output, but also going through the same S-box entries in doing so, and furthermore, for each pair of subkeys used with one plaintext, there is one and exactly one pair of corresponding subkeys for the other plaintext.

Incidentally, Eli Biham applied a transformation to LUCIFER in order to change when the subkey was applied in the round without actually changing the result of the cipher, in order to apply differential cryptanalysis to it. This transformation was just the transformation needed to turn it into a Markov cipher.

Here, differentials are defined in terms of the XOR operation; the paper did note that a cipher could be a Markov cipher in terms of a different operation instead. I would have expected that the generalization of differential cryptanalysis to other domains, such as that of arithmetic addition instead of XOR would have been mentioned at least in passing at its very inception, a paper by Helger Lipmaa and Shiho Moriai credits this paper with originating this generalization; perhaps this is not so surprising after all, as in practice, IDEA, a block cipher invented by that paper's authors, may well be the only one in common use against which the possibility of such attacks had to be considered in the design, since most other block ciphers rely upon the XOR operation enough that it is clearly the most effective basis for any differential attack against them.

### The Wide-Trail Design Strategy

The block cipher Rijndael which has been chosen as the AES was designed on the basis of something called the Wide-Trail design strategy. As large S-boxes increase the cost of implementing a block cipher in hardware, this principle of cipher design aims at obtaining strength through other means.

One of the principles of this design strategy is to ensure that the steps of the cipher that don't provide nonlinearity are effective in providing diffusion. The other main principle is to ensure that the S-boxes which are used are of the best possible quality.

Note that Rijndael still uses an S-box with 256 entries, each eight bits long, which is much larger than the S-boxes with 64 entries, each four bits long, used in DES. The larger an S-box is, the easier it is to prepare one with a useful degree of resistance to linear and differential cryptanalysis.

### What Makes a Good S-Box?

Although S-boxes can be used in different ways within an encipherment round, whether they are used directly in the plaintext-to-ciphertext path or in a Feistel round structure does not significantly change the properties an S-box is required to have.

For example, against linear cryptanalysis, if an S-box is equivalent to a linear relation such as f(x)=5x+3, then if used in a Feistel round, the relationship f(y,x)=(y+5x+3,x) remains a linear relationship that is approximated. Against differential cryptanalysis, if x' = x XOR 00001000 implies f(x') = x' XOR 10010101 much of the time, then in a Feistel structure, x' = x XOR 00001000 will now imply that f(y,x) = f(y,x') XOR 1001010100000000.

A recent paper by Jean Seberry and others noted that some simple algorithms to construct S-boxes with good properties for a single round lead to S-boxes which, when iterated, have bad properties. Thus, it isn't enough that S(x) lack these particular types of structure; S(S(x)) must do so as well, and so on.

If S(x) is 5x+3, then S(x+a) is 5x+5a+3, so an S-box where S(x+a) can be S(x) plus anything, depending on x, would be good against linear cryptanalysis as well as differential cryptanalysis; this is what led to the search for ways to make an S-box with a "flat XOR profile", which is precisely what, if adhered to too rigidly, makes higher powers of that same S-box unsafe. Again, one way to keep the chance of problems within bounds is simply to use a large and random S-box, since then its higher powers are also random. But it is possible to design an S-box to have a nearly flat XOR profile and also resist this danger, where an especially-designed S-box is desired in order to have security with the smallest possible S-box size.

### A Concrete Example

In DES, the first 4-bit substitution in the first S-box is:

``` 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
-----------------------------------------------
14  4 13  1  2 15 11  8  3 10  6 12  5  9  0  7
```

Let us try performing an XOR of different values with the inputs, and determine what it does to the outputs:

```   n          0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
n xor 1      1  0  3  2  5  4  7  6  9  8 11 10 13 12 15 14
-----------------------------------------------
f(n xor 1)   4 14  1 13 15  2  8 11 10  3 12  6  9  5  7  0
f(n)        14  4 13  1  2 15 11  8  3 10  6 12  5  9  0  7
-----------------------------------------------
xor         10 10 12 12 13 13  3  3  9  9 10 10  4  4  7  7
```

Just looking at one case shows us one thing immediately; because any XOR swaps pairs of values, the maximum number of different differences is only half the number of elements in the S-box.

### Bent Functions

Although a "flat XOR profile" may not be ideal for cryptography in practice, it still is important to understand how one may be achieved, as a starting point. A perfectly flat XOR profile is the defining property of a bent function.

In the binary domain, the basic definition of a bent function is as follows:

• The function gives a single-bit value as a result, from a multi-bit argument.
• This value is zero for half the possible arguments, and one for half the possible arguments.
• If one varies i over all possible values of the argument of the function, then for any j not equal to zero, f(i) and f(i xor j) are the same exactly half the time, and different exactly half the time.

A function returning a multi-bit result is bent when each of its bits is a bent function of the type described above, and in addition each of those functions is orthogonal to each other; that is, varying i over all possible values, where m and n are the numbers of any two bits in the result, and m and n are not equal, f(i,m) and f(i,n) are the same exactly half the time, and different exactly half the time.

Bent functions are not trivial to construct for all orders, although simple methods have been given to construct functions that approach being bent. And, of course, almost all large random S-boxes approximate this behavior as well.

It is easy to see that it is impossible to construct a bent function, according to that definition, with 2, 4, or 8 possible arguments. Another definition of bent functions, however, does not appear to have that limitation, one that is expressed in terms of the Walsh transform.

The binary Walsh functions are constructed using a recursion that is essentially the same one used to produce Hadamard matrices.

``` __ __        __ __ __ __        __ __ __ __ __ __ __ __
| 1  1|      | 1  1  1  1|      | 1  1  1  1  1  1  1  1|
__           __    __           __    __    __    __
| 1|__       | 1|__| 1|__       | 1|__| 1|__| 1|__| 1|__
__ __              __ __       __ __
| 1  1|__ __       | 1  1|__ __| 1  1|__ __
__       __        __       __ __       __
| 1|__ __| 1|      | 1|__ __| 1  1|__ __| 1|
__ __ __ __
| 1  1  1  1|__ __ __ __
__    __       __    __
| 1|__| 1|__ __| 1|__| 1|
__ __             __ __
| 1  1|__ __ __ __| 1  1|
__       __    __ __
| 1|__ __| 1|__| 1  1|__
```

That recursion, of course, is the one that repeatedly applies the pattern:

```  -----------------------
|   same   |   same   |
-----------------------
|   same   | opposite |
-----------------------
```

on larger and larger scales to an existing pattern.

Another way to construct the binary Walsh functions is like this:

``` __ __ __ __ __ __ __ __
| 1  1  1  1  1  1  1  1|

__ __ __ __
| 1  1  1  1|__ __ __ __

__ __       __ __
| 1  1|__ __| 1  1|__ __
__ __             __ __
| 1  1|__ __ __ __| 1  1|

__    __    __    __
| 1|__| 1|__| 1|__| 1|__
__    __       __    __
| 1|__| 1|__ __| 1|__| 1|
```
[Next] [Up] [Previous] [Index]