Equivalence Algorithms
The hierarchy of equivalences
Given two vectorial boolean functions [math]\displaystyle{ F,G : F_2^n \rightarrow F_2^n }[/math] there are various ways to define equivalence between [math]\displaystyle{ F }[/math] and [math]\displaystyle{ G }[/math]. We will study the algorithms for determining Linear, Affine, Extended Affine and CCZ equivalence between vectorial boolean functions.
Linear Equivalence
Given two vectorial boolean functions [math]\displaystyle{ F }[/math] and [math]\displaystyle{ G }[/math] we want to determine if there exist Linear permutations [math]\displaystyle{ A_1 }[/math] and [math]\displaystyle{ A_2 }[/math] such that [math]\displaystyle{ F = A_2 \circ G \circ A_1 }[/math].
The to and from algorithm
This algorithm was presented at eurocrypt 2003 [1]. This algorithm is mainly intended for when the boolean functions are permutation, and we will start by assuming [math]\displaystyle{ F }[/math] and [math]\displaystyle{ G }[/math] are permutations.
The idea of the algorithm is to go use information gathered about [math]\displaystyle{ A_1 }[/math] to deduce information about [math]\displaystyle{ A_2 }[/math] and the other way around. To see how this can work let's say we know some value of [math]\displaystyle{ A_1 }[/math], lets say [math]\displaystyle{ A_1(x) = y }[/math]. We of course also know the value of [math]\displaystyle{ G }[/math] at [math]\displaystyle{ y }[/math] so lets say that [math]\displaystyle{ G(y) = z }[/math]. Then we know that [math]\displaystyle{ F(x) = A_2 \circ G \circ A_1(x) = A_2 \circ G(y) = A_2(z) }[/math]. So we now know that [math]\displaystyle{ A_2(z) }[/math] must be equal to [math]\displaystyle{ G(y) }[/math].
For the other way around let's say we know some value of [math]\displaystyle{ A_2 }[/math], lets say [math]\displaystyle{ A_2(x) = y }[/math]. We know the value of [math]\displaystyle{ F^{-1} }[/math] at y, lets say [math]\displaystyle{ F^{-1}(y) = z }[/math]. Then we know that [math]\displaystyle{ F(z) = y = A_2 \circ G \circ A_1(z) }[/math] so we see that [math]\displaystyle{ A_2 \circ G \circ A_1(z) = y }[/math]. Since we know that [math]\displaystyle{ A_2(x) = y }[/math] we need [math]\displaystyle{ G \circ A_1(z) = x }[/math], which must mean that [math]\displaystyle{ A_1(z) = G^{-1}(y) }[/math].
So we now showed how we can deduce information knowing either a value of [math]\displaystyle{ A_1 }[/math] or [math]\displaystyle{ A_2 }[/math]. Now lets say we now the values of [math]\displaystyle{ A_1 }[/math] at a set of points. Since [math]\displaystyle{ A_1 }[/math] is linear we also know it's value at any linear combinations of these points so we can just assume that we know the values at [math]\displaystyle{ A_1 }[/math] at [math]\displaystyle{ k }[/math] linear independent points, which means that we know the value of [math]\displaystyle{ A_1 }[/math] at [math]\displaystyle{ 2^k }[/math] points. If we now somehow gain value of a point of [math]\displaystyle{ A_1 }[/math] which is not in the span of the already known points then we can deduce the value of [math]\displaystyle{ A_1 }[/math] at [math]\displaystyle{ 2^k }[/math] new points using any linear combination of points including this new point. Then we can use all of these new points and try to deduce points of [math]\displaystyle{ A_2 }[/math] as explained before. We will now explain the complete algorithm before giving the psudocode.
Given two permutations [math]\displaystyle{ F,G : F_2^n \rightarrow F_2^m }[/math] we construct the linear permutations [math]\displaystyle{ A_1,A_2 }[/math]. The algorithm is a backtracking algorithm, and whenever we discover a contradiction we backtrack to the last guess. We first guess two values of [math]\displaystyle{ A_1(x) }[/math]. Since we now know two values of [math]\displaystyle{ A_1 }[/math] we can two values of [math]\displaystyle{ A_2 }[/math], which means we can deduce a third value by linearity. Using this third value we can deduce a value of [math]\displaystyle{ A_1 }[/math], if this value is not in the span of the already known values of [math]\displaystyle{ A_1 }[/math] we can deduce two more values of [math]\displaystyle{ A_1 }[/math] and use this to deduce values of [math]\displaystyle{ A_1 }[/math] and so on. If we ever run out of values before we have finished we will have to make additional guesses. If we ever encounter that a situation where we deduce a value of [math]\displaystyle{ A_1 }[/math] or [math]\displaystyle{ A_2 }[/math], but we have already set them to be something else, we must backtrack to the last guess.
Runtime
It can be hard to estimate the runtime of this algorithm as it is hard to know how many guesses we have to make. Initially we will have to make two guesses (or just 1 if the s-boxes do not map 0 to 0) to get the algorithm started. Assuming we do not have to make any more guesses the algorithm runs in time [math]\displaystyle{ O(n^32^{2n}) }[/math] ([math]\displaystyle{ O(n^32^n) }[/math] if the s-boxes do not map 0 to 0). This assumption seems to hold for random functions, but there are bad cases for example when the functions differ in very few points. In general it seems hard to prove any good runtime guarantee for this algorithm.
Affine Equivalence
Given vectorial boolean functions [math]\displaystyle{ F,G : F_2^n \rightarrow F_2^m }[/math] find affine permutations [math]\displaystyle{ A_1,A_2 }[/math] such that [math]\displaystyle{ F = A_2 \circ G \circ A_1 }[/math]. We can also write this as [math]\displaystyle{ F \circ A_1^{-1} = A_2 \circ G }[/math]. If [math]\displaystyle{ A_1(x) = L_1(x) +a_1 , A_2(x) = L_2(x)+a_2 }[/math] then [math]\displaystyle{ F(x+a_1) }[/math] is linear equivalent with [math]\displaystyle{ G(x) + a_2 }[/math]. So we can guess any affine constants [math]\displaystyle{ a_1,a_2 }[/math] and check whether or not [math]\displaystyle{ F(x+a_1) }[/math] is linear equivalent with [math]\displaystyle{ G(x)+a_2 }[/math] using any linear equivalence algorithm. This will add a multiplicative factor of [math]\displaystyle{ 2^{2n} }[/math] to the runtime, but will give us an affine equivalence algorithm.
The to and from algorithm (Affine)
We can adapt the To and from algorithm to the affine case and only add a multiplicative factor of [math]\displaystyle{ 2^n }[/math] to the runtime. Instead of comparing [math]\displaystyle{ F(x+a_1) }[/math] to [math]\displaystyle{ G(x)+a_2 }[/math] for every possible [math]\displaystyle{ a_1,a_2 }[/math] we will instead find a representative function for [math]\displaystyle{ F(x+a) }[/math] for every [math]\displaystyle{ a }[/math] and then a representative function for [math]\displaystyle{ G(x) + a }[/math] for every possible [math]\displaystyle{ a }[/math]. We will then compare to see if any of these representative functions are equal.
The representative for a function is the lexicographic smallest linear equivalent function. To see why this work assume [math]\displaystyle{ F,G }[/math] are affine equivalent with [math]\displaystyle{ F = A_1 \circ G \circ A_2 }[/math] where [math]\displaystyle{ A_1 = L_1 + a_1 }[/math] and [math]\displaystyle{ A_2 = L_2+a_2 }[/math]. Then the functions [math]\displaystyle{ F(x+a_1) }[/math] will be linear equivalent with [math]\displaystyle{ G(x)+a_2 }[/math]. If we have found the minimal linear representative [math]\displaystyle{ F' }[/math] of [math]\displaystyle{ F(x+a_1) }[/math] then since [math]\displaystyle{ G(x)+a_2 }[/math] is linear equivalent with [math]\displaystyle{ F }[/math] it is also linear equivalent with [math]\displaystyle{ F' }[/math] so the minimal linear representative of [math]\displaystyle{ G(x)+a_2 }[/math] is at least smaller than [math]\displaystyle{ F' }[/math]. Using this argument the other way around we get that their linear representatives have to be the same function.
To actually compute the minimal representative of a function [math]\displaystyle{ F }[/math] we do the following. We want to construct [math]\displaystyle{ F' }[/math], the minimal permutation which is linear equivalent with [math]\displaystyle{ F }[/math]. We start by guessing the value of [math]\displaystyle{ A_1 }[/math] at the smallest element of [math]\displaystyle{ F_2^n }[/math]. Let say [math]\displaystyle{ A_1(x) = y }[/math]. We now do the same as before with going back and forth between [math]\displaystyle{ A_1 }[/math] and [math]\displaystyle{ A_2 }[/math], the only difference we always pick the lowest possible value of [math]\displaystyle{ A_1 }[/math] to deduce a value of [math]\displaystyle{ A_2 }[/math] and vise versa. Also whenever we need the value of a undefined point of [math]\displaystyle{ F' }[/math], we simply set to it to the lowest available.
Rank Algorithm
This algorithm [2] is efficient also for non-permutations but only functions of high algebraic degrees.
Rank table
The algorithm is based on using the rank table of a boolean functions, which we will now introduce. Given a boolean function [math]\displaystyle{ F: F_2^n \rightarrow F_2 }[/math]. We are going to consider this object algebraically using the ANF (algebraic normal form). [math]\displaystyle{ F = \sum_{u \in F_2^n }\alpha_ux^u }[/math]. We can look at [math]\displaystyle{ F }[/math] as a vector spanned by all monomials [math]\displaystyle{ x^u = x_1^{u_1}...x_n^{u_n} }[/math]. Let [math]\displaystyle{ F_{\geq d} }[/math] be the polynomial containing all monomials of [math]\displaystyle{ F }[/math] with degree at least [math]\displaystyle{ d }[/math]. Now given a vectorial boolean functions [math]\displaystyle{ F=(F_1,...,F_m) }[/math] we define the symbolic rank of [math]\displaystyle{ F }[/math] as the rank of the vectors [math]\displaystyle{ \{F_i\} }[/math] (where we view [math]\displaystyle{ F_i }[/math] as a vector). Denote this as [math]\displaystyle{ SR(F) }[/math].
We compose functions symbolically as [math]\displaystyle{ F \circ A_1 = \sum \alpha_u \cdot (M_u \circ A_1 ) }[/math]. We have that [math]\displaystyle{ deg(F \circ A_1) \leq deg(F) }[/math]. We can also compose [math]\displaystyle{ A_2 \circ F }[/math] where we replace [math]\displaystyle{ x_i }[/math] in [math]\displaystyle{ A_2 }[/math] by [math]\displaystyle{ F_i }[/math]. Let [math]\displaystyle{ A : F_2^{n-1} \rightarrow F_2^n }[/math] be an affine transformation with [math]\displaystyle{ A(x) = L(x) + a }[/math]. The range of [math]\displaystyle{ A }[/math] is an affine [math]\displaystyle{ n-1 }[/math] dimensional subspace so it's orthogonal subspace is 1 dimensional, so spanned by a single vector [math]\displaystyle{ h }[/math]. We call [math]\displaystyle{ h }[/math] the half space mask (HSM), since it partitions the space into 2 halves. [math]\displaystyle{ h \cdot a }[/math] is the half space free coefficient (HSC). Given an HSM and HSC [math]\displaystyle{ h }[/math] and [math]\displaystyle{ c }[/math] there is a canonical affine transformation [math]\displaystyle{ C_{|_{h,c}} : F_2^{n-1} \rightarrow F_2^n }[/math].
We can now define the rank table of [math]\displaystyle{ F }[/math] with respect to some constant [math]\displaystyle{ d }[/math]. For any [math]\displaystyle{ h \in F_2^n }[/math] we calculate [math]\displaystyle{ u = SR((F \circ C_{|_{h,0}})_{\geq d} ) }[/math] and [math]\displaystyle{ v = SR((C_{|_{h,1}})_{\geq d}) }[/math]. The rank table entry for [math]\displaystyle{ h }[/math] then becomes [math]\displaystyle{ (max(u,v),min(u,v)) }[/math]. For any specific tuple [math]\displaystyle{ (u,v) }[/math] the rank group is all [math]\displaystyle{ h }[/math] such that the rank table entry for [math]\displaystyle{ h }[/math] is [math]\displaystyle{ (u,v) }[/math]. The rank histogram is a mapping for each [math]\displaystyle{ (u,v) }[/math] to the size of the rank group. One last thing we will need is the concept of the rank histogram with with respect to a given rank group. Fix an element [math]\displaystyle{ h \in F_2^n }[/math]. The rank histogram of [math]\displaystyle{ h }[/math] with respect to the rank group [math]\displaystyle{ (u,v) }[/math] is defined the following way. Add [math]\displaystyle{ h }[/math] to all elements of the rank group [math]\displaystyle{ (u,v) }[/math] to get a set [math]\displaystyle{ U \subset F_2^n }[/math]. For each element [math]\displaystyle{ h' }[/math] of [math]\displaystyle{ U }[/math] we look at the rank group containing [math]\displaystyle{ h' }[/math], let's say it's [math]\displaystyle{ (u',v') }[/math]. The multi set containing the tuples [math]\displaystyle{ (u',v') }[/math] for each element [math]\displaystyle{ h' \in U }[/math] is the rank histogram of [math]\displaystyle{ h }[/math] with respect to [math]\displaystyle{ (u,v) }[/math]. The rank group [math]\displaystyle{ (u,v) }[/math] with respect to [math]\displaystyle{ (u',v') }[/math] is the multiset rank histogram of each element [math]\displaystyle{ h }[/math] of the group [math]\displaystyle{ (u,v) }[/math] with respect to the group [math]\displaystyle{ (u',v') }[/math].
Algorithm
The main idea of the algorithm is that if [math]\displaystyle{ F }[/math] and [math]\displaystyle{ G }[/math] are affine equivalent, then [math]\displaystyle{ SR(F_{\geq d}) = SR(G_{\geq d}) }[/math]. If [math]\displaystyle{ F }[/math] and [math]\displaystyle{ G }[/math] are affine equivalent with [math]\displaystyle{ F = A_2 \circ G \circ A_1 }[/math] we are going to start by trying to reconstruct [math]\displaystyle{ A_1 }[/math]. Do do this the idea is that for any element of the rank table of [math]\displaystyle{ (u,v) }[/math] of [math]\displaystyle{ G }[/math]. So if we find a entry [math]\displaystyle{ (u,v) }[/math] which contains only 1 element we know one point of [math]\displaystyle{ A_1 }[/math]. Now the rank groups can be large so only doing this may still result in a lot of guesses. For some group [math]\displaystyle{ (u,v) }[/math] we are going to pick another group [math]\displaystyle{ (u',v') }[/math] and calculate the rank group of [math]\displaystyle{ (u,v) }[/math] with respect to [math]\displaystyle{ (u',v') }[/math]. If this multiset contains an unique element, then the element [math]\displaystyle{ h }[/math] of [math]\displaystyle{ (u,v) }[/math] corresponding to this element must be matched to the corresponding element of [math]\displaystyle{ G }[/math] (Due to the linearity of [math]\displaystyle{ A_1 }[/math]). We can do this for any pair of groups, and if we find any element in the multiset of low cardinality we obtain a lot of information about possible matchings for [math]\displaystyle{ A_1 }[/math].
Runtime
The algorithm is estimated to be [math]\displaystyle{ O(n^32^n) }[/math] for a random permutation. This is based to some assumption about the distribution of the rank tables of random functions.
EA equivalence
Given two boolean functions [math]\displaystyle{ F,G : F_2^n \rightarrow F_2^m }[/math] find two affine permutations [math]\displaystyle{ A_1,A_2 }[/math] and an affine transformation [math]\displaystyle{ A_3 }[/math] such that [math]\displaystyle{ F = A_2 \circ G \circ A_1 + A_3 }[/math]
Jacobian algorithm
This algorithm can decide EA equivalence for quadratic functions only. [3]
The Jacobian
Given a vectorial boolean function, and any element [math]\displaystyle{ a \in F_2^n }[/math] the deriviative in direction [math]\displaystyle{ a }[/math] is defined by [math]\displaystyle{ D_a F(x) = F(x+a) + F(x) }[/math]. The Jacobian for a vectorial boolean function [math]\displaystyle{ F(x) = (F_1(x),...,F_m(x)) }[/math] is defined as [math]\displaystyle{ J F(x) = \begin{pmatrix}D_{e_1}F_1(x) & D_{e_2}F_1(x) & ... & D_{e_n}F_1(x) \\ D_{e_1}F_2(x) & D_{e_2}F_2(x) & ... & D_{e_n}F_2(x) \\ \vdots & \vdots & \vdots & \vdots \\ D_{e_1}F_m(x) & D_{e_2}F_m(x) & ... & D_{e_n}F_m(x) \end{pmatrix} }[/math] where [math]\displaystyle{ e_i }[/math] is the i-th basis vector of [math]\displaystyle{ F_2^n }[/math]. We denote the linear part of the jacobian by [math]\displaystyle{ J_{lin}F(x) }[/math].
The algorithm
The algorithm is based on the following two facts that if [math]\displaystyle{ F }[/math] and [math]\displaystyle{ G }[/math] are EA equivalent quadratic functions with [math]\displaystyle{ F = A_2 \circ G \circ A_1 + A_3 }[/math] we can assume that [math]\displaystyle{ A_1 }[/math] and [math]\displaystyle{ A_3 }[/math] are linear. So we have just [math]\displaystyle{ A_2(x) = L_2(x) + a_2 }[/math]. The other fact is that [math]\displaystyle{ J_{lin}F(x) = L_2\cdot J_{lin}G(A_1(x)) \cdot A_1 }[/math]. This allows the us to start by searching for pairs [math]\displaystyle{ (L_2,A_1) }[/math] first, and deduce the other values later.
To deduce possible pairs [math]\displaystyle{ (L_2,A_1) }[/math] the algorithm does the following. We are first gonna try to find [math]\displaystyle{ A_1 }[/math]. Using the fact that the rank of the matrix[math]\displaystyle{ Jac_{lin}F(x) }[/math] equals the rank of [math]\displaystyle{ Jac_{lin}G(A_1(x)) }[/math] since all of the matrices/transformations are permutations. This means that [math]\displaystyle{ A_1(x) }[/math] can only be mapped to elements which results in the rank being the same. So we are going to compute all possible ranks of [math]\displaystyle{ Jac_{lin}F(x) }[/math] and [math]\displaystyle{ Jac_{lin}G(x) }[/math]. We are then going to look at the least common rank of these tables, let say this value is [math]\displaystyle{ k }[/math]. Let [math]\displaystyle{ S_F }[/math] be all inputs such that [math]\displaystyle{ Jac_{lin} F(x) }[/math] has rank [math]\displaystyle{ k }[/math] and [math]\displaystyle{ S_G }[/math] all [math]\displaystyle{ x }[/math] such that [math]\displaystyle{ Jac_{lin}G(x) }[/math] has rank [math]\displaystyle{ k }[/math]. We then know by the previous observation that [math]\displaystyle{ A_1 }[/math] has to map elements from [math]\displaystyle{ S_F }[/math] to elements of [math]\displaystyle{ S_G }[/math]. If these sets [math]\displaystyle{ S_F }[/math] and [math]\displaystyle{ S_G }[/math] are small (we can assume they are the same size) then the number of guesses we have will not be to large. To start with we will guess the value of [math]\displaystyle{ A_1 }[/math] of some elements of [math]\displaystyle{ S_F }[/math].
Having guessed some values of [math]\displaystyle{ A_1 }[/math] we are going to deduce values of [math]\displaystyle{ L_2 }[/math]. If we have guessed that [math]\displaystyle{ A_1u = w }[/math]. Then the pair [math]\displaystyle{ (L_2,A_1) = (X,Y) }[/math] is a solution to the linear system of equations
[math]\displaystyle{ X\cdot Jac_{lin}F(v) - Jac_{lin}F(w) \cdot Y = 0 }[/math]
[math]\displaystyle{ Y \cdot v = 0 }[/math]
We are going to guess enough values of [math]\displaystyle{ A_1 }[/math] so that this system has an unique solution (Since each guess gives us more equations). Having done this and found a pair [math]\displaystyle{ (L_2,A_1) }[/math] we can deduce [math]\displaystyle{ A_3 }[/math] and [math]\displaystyle{ a_2 }[/math] with basic linear algebra. Lets now describe the algorithm in more detail.
1. Compute the rank table of [math]\displaystyle{ F }[/math]. [math]\displaystyle{ R(F)[j] = \{x \in F_2^n | rank(Jac_{lin}F(x)) = j \} }[/math]. Do the same for [math]\displaystyle{ G }[/math]
2. Let [math]\displaystyle{ s }[/math] be the number of guesses of [math]\displaystyle{ A_1 }[/math] we are going to make. Let [math]\displaystyle{ i = \min |R(F)[j]| }[/math]. Pick [math]\displaystyle{ s }[/math] elements of [math]\displaystyle{ R(F)[i] }[/math]. We are then gonna guess all possible values of [math]\displaystyle{ A_1 }[/math] on these points. But we only have to guess values inside [math]\displaystyle{ R[G][i] }[/math]. Let's say we made the guesses [math]\displaystyle{ A_1u_1 = w_1,...,A_1u_s = w_s }[/math].
3. Try to solve the [math]\displaystyle{ s }[/math] system of equations [math]\displaystyle{ X \cdot Jac_{lin}F(v_i) - Jac_{lin}F(w_i)\cdot Y,\: Y \cdot v_i = 0 }[/math]. If this system has to many solutions make another guess (increase the value of [math]\displaystyle{ s }[/math] temporary).
4. When the system of equations does not have to many solutions find all solutions [math]\displaystyle{ (L_2,A_1) }[/math]. Then deduce the rest of the values [math]\displaystyle{ A_3 }[/math], [math]\displaystyle{ a_2 }[/math]. If this is possible we are done, otherwise go back to step 2 and make another guess.
Runtime
The runtime of this algorithm is related to the rank table which is related to the differential uniformity of the function. Let [math]\displaystyle{ R = \min R(F)[j] }[/math]. Then we will at worst have to make around [math]\displaystyle{ R^s }[/math] guesses. For each such guess we will have to solve some linear equations which can be done in around [math]\displaystyle{ (n^2+m^2)^w }[/math] where [math]\displaystyle{ w }[/math] is a matrix multiplication constant. In total we get a time of [math]\displaystyle{ O(max(n,m)^w2^n + R^s(m^2+n^2)^w) }[/math], where the first part is for computing the rank table. Note that when [math]\displaystyle{ F }[/math] is APN all the values have the same rank, so [math]\displaystyle{ R = 2^n }[/math]. Which is the worst case for this algorithm.
References
- ↑ Biryukov, Alex, et al. "A toolbox for cryptanalysis: Linear and affine equivalence algorithms." Advances in Cryptology—EUROCRYPT 2003: International Conference on the Theory and Applications of Cryptographic Techniques, Warsaw, Poland, May 4–8, 2003 Proceedings 22. Springer Berlin Heidelberg, 2003.
- ↑ Dinur, Itai. "An improved affine equivalence algorithm for random permutations." Annual International Conference on the Theory and Applications of Cryptographic Techniques. Cham: Springer International Publishing, 2018.
- ↑ Canteaut, Anne, Alain Couvreur, and Léo Perrin. "Recovering or testing extended-affine equivalence." IEEE Transactions on Information Theory 68.9 (2022): 6187-6206.