Recovering Secret Keys from Weak Side Channel Traces of Differing Lengths

Recovering Secret Keys from Weak Side Channel Traces of Differing Lengths Colin D. Walter Comodo CA, Bradford, UK Colin.Walter@comodo.com

Outline • Background & History • A typical Randomised Exponentiation Algorithm. • A metric to measure fitness of recoding guesses. • The decision tree for choosing bits • Results • Conclusion

Background Several Standard SW Counter-Measures to Side Channel Leakage from Exponentiation: • Blind the exponent by adding a random multiple of the group order. • Pick an algorithm where the pattern is independent of the secret exponent, e.g. • Square-and-always-multiply • Montgomery Powering Ladder • Pick an algorithm where the pattern is randomised: • Liardet-Smart – Ha-Moon • Oswald-Aigner – Mist

Disadvantages None of these is a panacea: • The fixed pattern algorithms tend to be less efficient, and averaging may still reveal the exponent. • Processing the ~512 known bits of a 1024-bit RSA key may leak enough to reveal 32-bit blinding.(Fouque et al,CHES 2006) • Perfect knowledge of the pattern of squares and multiplications in a randomised exponentiation algorithm usually reveals the key if it is re-used.

Problems for an Attacker • There is always a lot of noise in measurements. • Averaging to determine correct bits is essential. • For randomised exponentiation algorithms, Square & Mult operations cannot be aligned directly with key bits. • Incorrect bit deductions will always occur. • The locations of likely errors must be identified for a computationally feasible algorithm.

Example: Liardet-Smart Recode the binary representation of key D from right to left: • Add in the Borrow of 0 or +1 to give new D. • Choose base 2 & digit 0 if D even. • Randomly choose base m = 2i (i≤ R) if D odd. • Digit d is the residue of D mod m with least absolute value. • The Borrow is 1 for –ve digits, otherwise 0. Exponentiation MD in ECC: • Pre-compute table of required odd powers Md. • Read the recoded digits left to right. • Perform i squares and a multiplication for m = 2i and d≠ 0 Traces may have different lengths: the ith operation is associated with different bits in different traces.

Liardet-Smart 2 Here are some recodings of ...1310 with operators aligned: (D = double, A = add) ... 12 12 02 12 ... D A D A D D A ... 12 –14 02 12 D A D D A D D A ... 38 02 12 ... D D D A D D A ... 12 12 14 ... D A D A D D A ... 12 –14 14 D A D D A D D A ... 38 14 ... D D D A D D A ... 12 02 –38 ... D A D D D D A The average operator yields almost no information: data from the top bit is spread over three or four columns.

Liardet-Smart 3 Here are the recodings with doubles aligned: 12 12 02 12 DA DA D DA 12 –14 02 12 DA D DA D DA 38 02 12 D D DA D DA 12 12 14 DA DA D DA 12 –14 14 DA D DA D DA 38 14 D D DA D DA 12 02 –38 DA D D D DA • There is a lot of information theoretic content: column contents reveal the key bits. • This reveals the key if leakage is strong.

History Karlov & Wagner (CHES 2003) • Uses a Hidden Markov Model • Applies Viterbi’s algorithm to find the best fit key. • Assumes traces can be parsed precisely into symbols D and DA. • So cannot deal with weak leakage. Green, Noad & Smart (CHES 2005) • Repairs the parsing assumption for each trace, selecting the most likely string over symbols D and A. • Treats traces serially one by one • Convergence is unlikely with weak leakage – it can’t get started.

A Way Forward Aim: • Treat all information about a given key bit at the same time in order to average out the noise. Problem: • Viterbi’s algorithm becomes computationally infeasible if the Markov Model treats the traces in parallel. So: • Re-structure / simplify the algorithm. • Don’t process the whole trace to get the best fit;only process the next few bits and choose the best fit. • Don’t evaluate all key prefix choices,only select the best few to consider.

d1 d3 d4 A d2 d5 D Double/Add pattern of recoding Trace probabilities of matching “A” or “D” μ = d1 + d2 + d3 + d4 + ... The Metric • Define the distance between a trace t and recoding r by μ(t,r) = Σi (1 – pr(ti = ri)) where pr(ti = ri) is probability that the ith operation in tis the same as the ith operation for r. (cf Hamming Wt.)

1 1 ... min μ 1 1 1 0 0 0 0 1 0 0 The Metric • Define the distance between a trace t and a key choice D by μ(t,D) = min { μ(t,r) | r is a recoding of D } This selects the best match recoding of D.

The Metric • Define the distance between a trace t and recoding r by μ(t,r) = Σi(1 – pr(ti = ri)) where pr(ti = ri) is probability that the ith operation in tis the same as the ith operation for r. This should be small for correct interpretation of the trace. • Define the distance between a trace t and a key choice D by μ(t,D) = min { μ(t,r) | r is a recoding of D } This selects the best match recoding of D. • Define the distance between a trace set T and a key D by μ(T,D) = ΣtTμ(t,D) The best fit key minimises this distance.

The Markov Model For one trace & weak leakage the standard Markov model & Viterbi algorithm yields an almost useless guess at the key. But the Markov Model for T traces and a recoding automaton with S states is too big: • There are at least |S|T states for each processed key bit/digit • At least n |S|T states for key of length n. • Computationally infeasible even for |S| = 2 and T > 100. We need to be able to choose large T when leakage is weak.

Re-organisation Instead of the Markov Model structure: • Build a binary tree representing all possible choices for the n key bits. • Descend the tree and for each node determine the best recoding for each trace, independently of length. • This defines a value of the metric μ for each node of the tree. • Eventually... pick the best leaf, selecting recodings whose length equals the trace length.

Simplifications To keep the complexity down: • Prune the nodes at a given depth in the binary tree, keep only the B nodes with the best values for the metric. • Prune the recoding choices within each node: keep only the R best values for each trace. • Instead of looking at the leaves to decide the best branch,look at only λ more nodes beyond the last decided bit. • NB The cost of a recoding decision can spread over several more input symbols, so λ should not be too small.

Benefits • Traces become aligned correctly (or almost correctly) with key bits/digits by selecting the best fit recoding. • Summing the metric values for best recodings of each trace provides the averaging that reduces noise and enables the best key bit to be selected. • Locations for incorrect bits can be determined by looking at the difference in the metric between the 0- and 1- branches of a node in the tree.A small difference means lack of certainty about the decision. • Key bit positions can be ordered according to this probability of correctness.

Results Does it all work? • For square-and-multiply it reduces to the optimal averaging which Kocher used – this works! • For Oswald-Aigner & Ha-Moon it works with much lower leakage than prior algorithms. • For Liardet-Smart, the random variation in base has always made this recoding more difficult to handle – here it just requires a few more traces or stronger leakage.

Some Figures • Take the Ha-Moon recoding. Digit set = { –1, 0, +1}. • Assume a 70% chance of deciding correctly between a square or multiplication from the side channel trace, but unable to distinguish the multiplications for –1 and +1. • Take typical 192-bit ECC key & only 10 traces. • In 88% of cases there are no errors in the 168 bits we are most certain of. • There are less than 9 errors (on av) in the remaining 24 bits. • It is computationally feasible to correct these errors. (Under 106 cases to check.)

Conclusion • Traces from randomised exponentiation algorithmscan be aligned effectively to pool the leakage associated with individual key bits. • This is possible even with very weak leakage. • Locations of possible bit errors are identified with ease. • It is computationally feasible to correct all the errors in a high proportion of cases. • Designers of cryptographic chips should assumethe information content of side channel traces can be combined in a computationally effective way to reveal the secret key if it is re-used sufficiently often.

Recovering Secret Keys from Weak Side Channel Traces of Differing Lengths