# Introduction

**Update:** Slides from the related presentation at DTU, can now be found here.

Those who have read Howard M. Heys excellent introduction to Linear and Differential cryptanalysis or even worked with linear hulls (e.g. read the Rijndael book) might be tempted to believe that the absence of linear approximations with a non-zero correlation implies that the cipher is immune to linear cryptanalysis. However the existence of linear hulls with correlation zero also allows an attacker to distinguish the cipher from a random permutation with high probability. In this entry we will explore the work by Andrey Bogdanov and Vincent Rijmen on so-called “Zero-Correlation Linear Cryptanalysis” and apply the technique to a toy cipher.

This post is based on the orignal paper by Bogdanov and Rijmen. I highly recommend reading the paper for those who find this entry of interest. The original attack is mostly of theoretical interest (due to high data complexity), but has important implications for designing ciphers secure against linear cryptanalysis, e.i. Challenging the notation that Wide-trail and counting active S-Boxes is sufficient for claiming resistance against linear cryptanalsysis.

# Beyond Heys : An Introduction to Key-Dependence

I assume the reader has some familiarity with linear cryptanalysis (e.g. has read Heys and is familiar with the concept of linear trails). This section is meant as a brief overview to provide motivation for the attack and the discussion below is therefore highly simplified.

A block-cipher is a function \(E : \mathbb{F}_2^{k} \times \mathbb{F}_2^{n} \to \mathbb{F}_2^{n} \). A linear approximation is a pair \( (\alpha, \beta) \in \mathbb{F}_2^{n} \times \mathbb{F}_2^{n} \). The dot-product in \( \mathbb{F}_2^{n} \) allows us to define the “parity” functions \( f(x) = \alpha^{t} x \) and \( g(x) = \beta^{t} x \) of \( \alpha \) and \( \beta \) respectively. If \( \alpha \neq 0 \) and \( \beta \neq 0 \) then we say that the approximation is non-trivial. We often call \( \alpha \) the input mask and \( \beta \) the output mask.

Essentially linear cryptanalysis considers the correlation \( C(f(x), g(E_K(x)) ) \) over the plaintext space \( x \in \mathbb{F}_2^{n} \), between parity functions \(f\) and \(g\) for approximations \( ( \alpha, \beta ) \) for the cipher \( E_K \). The hope is that the correlation for the “target cipher” and the correlation for a “random permutation”, will deviate significantly.

In a classical attack, we then let the “target cipher” be R rounds of the cipher under attack and assume that R+2 rounds of the cipher acts as a random permutation (at-least with regards to the approximation we have chosen). We then attack R+1 rounds of the cipher, by guessing (parts of) the last round key and (partially) decrypting the last round. If the guess was correct we obtain plaintext/ciphertext pairs from the “target cipher” (R rounds), if we guessed wrong we essentially encrypted one more round and obtained plaintext/ciphertext pairs from the random permutation. If we can differentiate between the correlation for some approximation over the random permutation and “target cipher” we can distinguish the two cases, from which we obtain a list of candidates for (parts of) the last round-key. This allows us to move from distinguishing to key-recovery.

Of course a block-cipher does not define a single permutation, but a family of permutations indexed by the key \( K \in \mathbb{F}^{k} \). So it makes little sense to talk about a single correlation for block-cipher, one assumption which is often implicit in early work is the so-called key-equivalence hypothesis, which states that the correlation for a particular approximation remains constant over the key-space (or does not change much). But this needs to be justified and for the most part it is not the case. In fact for randomly sampled permutations over a sufficiently large domain we expect the correlation of any non-trivial approximation to follow a normal distribution (see Probability of distibutions of Correlation and Differentials in Block Ciphers).

In other words we obtain a distribution of the correlation over the key-space and so we are not distinguishing between correlations directly, but distributions of correlations.

It turns out that even evaluating the exact correlation for a single instantiation of the cipher under a given key is infeasible for all but the smallest block-sizes, since it would require us to enumerate all possible inputs to the cipher. However we can calculate the correlation between parities over the cipher as the sum of correlation (contributions) for individual trails \( T \): \[ C(\alpha^{t}, \beta^{t}) = \sum_{T \ = \ \alpha || \ldots || \beta} C(T) \] We call the set of trails starting in \( \alpha \) and ending in \( \beta \), the “linear hull” between \( \alpha \) and \( \beta \). In modern linear cryptanalysis we often estimate the correlation for a single key by attempting to discover the most significant terms in the summation above (“dominant trails”); each of which can be evaluated efficiently against all practical ciphers for a particular key. We obtain an estimate for the distribution by sampling keys at random and computing the estimated correlation.

**Example:**
Below I have generated an estimate for the correlation distribution over the key-space of PRESENT
for the approximation (0x8000000000000000, 0x8000000000000000) over 22 rounds,
based on 1000000 keys and the hull of all trails with Hamming weight \( \leq 4 \):

Hence, if we sampled a correlation of \(2^{-30.50}\) we midge judge that we are sampling from the PRESENT distribution. Whereas a correlation of \( \approx 0 \) is likely sampled from the random permutation distribution. Observe that for many keys, discerning which distribution the correlation is sampled becomes hard and our distinguisher becomes in-effective. Therefore key-recovery may fail to find the correct key or return a large number of false positives.

# Zero-Correlation Hulls

With this introduction in mind, the idea of zero-correlation linear attacks is simple:

Suppose we can find a non-trivial approximation \( ( \alpha, \beta ) \) where all trails in the hull have a correlation contribution of 0 for any key. Then the summation above is 0 and the “target cipher” / “right key” distribution is constant zero. Linear cryptanalysis without key-dependence!

In this case the distribution (like seen above) for the target cipher is a point distribution at 0, yet we still expect the random permutations to follow a normal distribution. Hence, unlike before, if we sample exactly zero we judge that we are interacting with the target cipher. If we sample any other correlation we judge that the samples originate from the random permutation.

# Zero-Correlation Hulls for Feistel Networks

The paper above (and subsequent work by Bogdanov et al.), details multiple zero-correlation hulls for concrete ciphers (AES), as well as general constructions (e.g. Feistel). The Feistel attacks are particularly sweet, since it only depends on the F-function being invertible. In other words, the F-function could be sampled uniformly at random from the set of all permutations for each round key (of any length); and the cipher would still be vulnerable.

In the remainder of this post, we always omit the final swap from the Feistel network. We shall consider a class of hulls with correlation zero described for a 5-round balanced Feistel. Let \( a \in \mathbb{F}^{n/2}, a \neq 0 \) and consider the approximation: \[ (\alpha = 0 || a, \beta = 0 || a) \] Consider a linear trail \( (u_0, u_1, u_2, u_3, u_4, u_5) \) with \(u_0 = \alpha\), \(u_5 = \beta\), as illustrated below:

Let us assume that the trail has non-zero correlation. The approximation over the first F-function (orange), must be \( (0, 0) \), since the input mask is \( 0 \) and F is a permutation, hence \( u_1 = (a, 0) \). The approximation over the second F-function (yellow), must be \( (b, a) \) for some non-zero b, since the output mask is \( a \neq 0 \). The arguments for \( u_3, u_4 \) are completely analogous to the two above. We observe that the approximation over the middle (red) F-function becomes \( (0, b)\)

Hence we have an approximation with a zero mask and a non-zero mask over the middle F-function (red). The correlation is therefore that between a constant (0) and a balanced function over a permutation, which implies that the correlation is zero.

We conclude that every trail in the hull has correlation contribution 0. Therefore, the approximation must have correlation zero, \( C(\alpha^{t}, \beta^{t}) = 0 \). We use such approximations to construct a distinguisher between a 5-round Feistel network and a “random” permutation; for which we do not generally expect the approximation to have correlation zero.

With this distinguisher we can then mount an attack on 6 or more rounds.

# Evaluating The Correlation Without Full Codebook

We are required to evaluate the correlation accurately to check if it is non-zero. With the naive method, this would require the full codebook, which obviously makes the attack rather uninteresting: why would we need the key if we already have a lookup table for all inputs?

However as noted by Rijmen and Bogdanov we can evaluate the exact correlation using \( 2^{n-1} \) chosen plaintexts, when the approximation is over a permutation: Let \( \alpha, \beta \) be a non-trivial approximation, let \( P \) be a permutation.

Consider the following 4 disjunct sets of input/outputs \( (x, y) \) to \( P \):

\[ T_{00} = \{ (x, y) | \alpha^{t} x = 0 \text{ and } \beta^{t} y = 0 \} \]

\[ T_{01} = \{ (x, y) | \alpha^{t} x = 0 \text{ and } \beta^{t} y = 1 \} \]

\[ T_{10} = \{ (x, y) | \alpha^{t} x = 1 \text{ and } \beta^{t} y = 0 \} \]

\[ T_{11} = \{ (x, y) | \alpha^{t} x = 1 \text{ and } \beta^{t} y = 1 \} \]

Since any non-trivial parity \( \alpha^{t} \) takes values \( 1 \) and \( 0 \) equally often, we conclude:

\[ |T_{00}| + |T_{01}| = |T_{10}| + |T_{11}| = 2^{n - 1} \]

Furthermore since \( dom(P) = codom(P) \), the same applies to \( \beta^{t} \) on the outputs:

\[ |T_{01}| + |T_{11}| = |T_{00}| + |T_{10}| = 2^{n - 1} \]

From which we can derive:

\[ |T_{00}| = |T_{11}| \]

\[ C(\alpha^t x, \beta^t y) = \frac{ |T_{00}| + |T_{11}| }{ 2^{n - 1} } - 1 = \frac{ 2 \cdot |T_{00}| }{ 2^{n - 1} } - 1 = \frac{ 2 \cdot |T_{11}| }{ 2^{n - 1} } - 1 \]

Therefore obtaining the size of \( T_{11} \) is
sufficient for determining the **exact** correlation.
Observe that this makes the attack a chosen plaintext attack,
since we must choose all the inputs \( x \) such that
\( \alpha^{t} x = 1 \) and obtain the corresponding outputs.

# The ZeroCipher

Our target will be a 6-round balanced 32-bit Feistel-cipher designed for the occasion, dubbed the “ZeroCipher”. The F-function of the ZeroCipher consists of key addition (XOR), the parallel application of two instances of the AES-Sbox, followed by a bit permutation (rotation) to obtain some level of diffusion. A single round is illustrated below:

The final swap is omitted (so that encryption / decryption is symmetric), and all round keys are independent. Here is C-code for encryption/decryption procedure:

```
uint16_t F(uint16_t k, uint16_t v) {
uint16_t t1;
uint16_t t2;
t1 = k ^ v;
t2 = sbox[t1 & 0xff];
t2 |= sbox[t1 >> 8] << 8;
return (t2 << 4) | (t2 >> 12);
}
uint32_t cipher(uint16_t k[ROUNDS], uint32_t v) {
uint16_t v1 = v >> F_SIZE;
uint16_t v2 = v & F_MASK;
for (size_t n = 0; n < ROUNDS; n++) {
v1 ^= F(k[n], v2);
v1 ^= v2;
v2 ^= v1;
v1 ^= v2;
}
v1 ^= v2;
v2 ^= v1;
v1 ^= v2;
return ((uint32_t) v1) << F_SIZE | ((uint32_t) v2);
}
```

# The Attack

1) Let \( a = \verb!0xffff! \). So our approximation becomes \( \alpha = \beta = (0^{16} \ || \ a) \).

2) Request the encryption \(c\) of all \( 2^{31} \) plaintexts \(p\) where \( \langle \alpha, p \rangle = 1 \).

3) Attempt decryption of the last round for every possible round-key \( k \) and calculate the correlation between \( \langle \alpha, p \rangle \) and \( \langle \beta, c’ \rangle \) for the partially decrypted ciphertexts \( c’ \).

4) If the round key was correct, we obtain the ciphertexts for a 5-round version of the cipher, where the hull has correlation zero. If we calculate the correlation to be zero (e.i. \( \langle \beta, c’ \rangle \) is balanced over \( c’ \)), we output the round-key \( k \) as a possible candidate.

Note that we do not need to store the plaintexts after obtaining the ciphertexts.

Overall the attack is very simple, since there is no complex trail-search algorithm:

```
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <assert.h>
#include "cipher.c"
#include "oracle.c"
#define IN_PARITY (1) // zero also works
uint32_t parity(uint32_t v) {
v ^= v >> 16;
v ^= v >> 8;
v ^= v >> 4;
v ^= v >> 2;
v ^= v >> 1;
return v & 1;
}
uint32_t* collect(uint32_t alpha, size_t data) {
uint32_t pt = 0;
uint32_t* ct = malloc(sizeof(uint32_t) * data);
for (size_t n = 0; n < data; pt++) {
// filter plaintext parity
if (parity(alpha & pt) != IN_PARITY)
continue;
// request encryption
ct[n++] = oracle(pt);
if ((n & 0xffffff) == 0)
printf("<attack>: pairs 0x%08zx\n", n);
}
return ct;
}
int main(int argc, char *argv[]) {
setvbuf(stdout, NULL, _IONBF, 0);
// to facilitate the array-job (please ignore)
if (argc != 3)
return -1;
long nodes = atol(argv[1]);
long index = atol(argv[2]);
assert(index < nodes);
uint32_t keys = 1 << F_SIZE;
uint32_t keys_per_node = (keys / nodes) + 1;
uint32_t key_start = keys_per_node * index;
uint32_t key_end = key_start + keys_per_node;
key_end = key_end > 0xffff ? 0xffff : key_end;
printf("<attack>: work slice: 0x%04x - 0x%04x \n", key_start, key_end);
// calculate data-complexity
size_t data = 1L << (BLOCK_SIZE - 1);
printf("<attack>: need to collect 0x%08zx pairs\n", data);
// linear approximation (pick all bits)
uint32_t alpha = F_MASK;
uint32_t beta = F_MASK;
printf("<attack>: alpha = %04x\n", alpha);
printf("<attack>: beta = %04x\n", beta);
// collect ciphertexts (online phase)
uint32_t* ct = collect(alpha, data);
// trial decryption and correlation est.
printf("<attack>: begin key enumeration\n");
#pragma omp parallel for
for (uint32_t n = key_start; n < key_end; n++) {
size_t hits = 0;
uint16_t key = n & F_MASK;
// calculate correlation of hull
for (size_t i = 0; i < data; i++)
if (parity(crnd(key, ct[i]) & beta) == IN_PARITY)
hits++;
// check for corr = 0
if (hits == (data / 2))
printf("<attack>: possible key, %04x\n", key);
}
}
```

The attack requires 8GB of RAM and took \( \approx \) 12 hours on 12 nodes with 8 cores.

It recovers 3 key candidates (from 65536 possible round keys):

```
hpclogin1(s162820) $ grep possible ZeroCorrelationAttack.*
ZeroCorrelationAttack.o5376635-0:<attack>: possible key, 03ba
ZeroCorrelationAttack.o5376635-4:<attack>: possible key, 57da
ZeroCorrelationAttack.o5376635-9:<attack>: possible key, c749
~/zero-corr
```

Where the last round-key is the correct one found in `oracle.c`

:

```
#include <stdint.h>
#include "cipher.c"
uint16_t oracle_key[ROUNDS] = {
0xf3f4,
0x45bc,
0x0048,
0xaf4f,
0x4ba0,
0xc749
};
uint32_t oracle(uint32_t v) {
return cipher(oracle_key, v);
}
```

# Summary

We saw that the existence of linear hulls with correlation zero can be used to create distinguishers in a manner very similar to impossible differentials. We explored one such class of hulls for Feistel networks and applied the attack to a 32-bit toy cipher.

This post only covers the absolute basics of zero-correlation linear attacks, in particular later work details how data complexity can be reduced by using multiple zero-correlation approximations and employing a more complex distinguisher based on the expected mean of the sum of sampled correlations for multiple zero-correlation hulls.