Side Channel Leakage from Implementations of Modular Multiplication

Side Channel Leakage from Implementations of Modular Multiplication Colin D. Walter ISG, Royal Holloway, University of London www.isg.rhul.ac.uk colin.walter@rhul.ac.uk

Overview • Active glitch attacks – CRT and the importance of checking the result. • Square-and-multiply: its weakness under power analysis. • m-ary exponentiation and its strength. • Timing attacks caused by variations in modular multn. • Strengthening Montgomery Modular Multiplication. • Summary, Recommendations and Bibliography. Colin D. Walter, Royal Holloway, London

Power Analysis • Classical crypt-analysis: • Mathematical • Chosen input, observable output • Only treats functional I/O • Marc Joye’s lecture: “side channel” leakage • Power: SPA, DPA • Electro-magnetic: SEMA, DEMA • Sound?: mains supply & clock harmonics • Timing

Power & EM Analysis • Causes of Power/EM Variation: • Gate switching • Hamming Weight • Bus activity, Multipliers • A measure of change of state • Average removes dependence on previous state • Main Side Channel Sources : • Power: through supply/ground lines • EMR: from components, such as buses, multipliers, etc • Protection: capacitors for power, Faraday cage for EMR • Tamper resistance is costly • Better attack equipment for more cash • Centres: Louvain-la-Neuve, Cambridge, IBM Watson et al.

Faraday Cage Faraday cage to hide EMR & detect intrusion (2 wires & 1 wire) Colin D. Walter, ISG, Royal Holloway

Glitch Attacks • Active vs passive attacks • Fault induction, glitching • Means: charge, light, emr, clock, voltage, temperature in order to stress the circuit outside design tolerances. • Precision of fault injection: • Time • Place • Result • Permanence

Chinese Remainder Thm (CRT) • Notation: N = PQ; C ≡ MEmod N; M ≡ CDmod N • Aim: To compute M from C more efficiently by knowing the factorisation of N. • Chinese Remainder Theorem: There is a unique solution M modulo N {N = lcm(N1,N2,...,Nn)} to simultaneous congruencesM≡ C1 mod N1, M≡ C2 mod N2 , ..., M≡ Cn mod Nn namely M = C1A1 + C2A2 + ... + CnAn where Ai ≡ 0 mod Njfor j ≠i and Ai ≡ 1 mod Ni. • Compute: MP≡ CDPmodP & MQ≡ CDQmodQ for DP = D modφ(P) & DQ = D modφ(Q). Then MP≡ CDmodP & MQ≡ CDmodQ and M = {(MQ–MP)A mod Q}P + MP where A ≡ P–1modQ. • Efficiency:Time: The exponentiation is 4 times faster, andSpace: The required registers are half the size.

Extended Euclidean Algorithm • The EEA is used primarily to solve x = a-1 mod p, i.e. invert elements in Fp. (There are other solutions such as x = ap-2 mod p.) • We seek x, y such that ax + py = 1 or, more generally,ax + by = c × gcd(a,b) • The Euclidean algorithm already gives gcd(a,b).

Extended Euclidean Algorithm Input:Integers a >b > 0 and c > 0. Output: gcd(a,b) and x,y such that ax+by = c×gcd(x,y) . If b = 1 then output 1, 0, c and stop. ra ; r'b ; xc ; x' 0 ; y 0 ; y'c ; While r' > 0 { qr/r' ; r''r – qr' ; rr' ; r'r'' ; x''x – qx'; xx' ; x'x'' ; y''y – qy'; yy' ; y'y'' ; } Output r, x, y There is a loop invariant ax+by=cr & ax'+by'=cr' & gcd(r,r') = gcd(a,b)

Glitch Attacks on CRT • Compute MP≡ CDPmod P & MQ≡ CDQmod Q. Then use CRT to obtain M ≡ CDmod N as above. • Assume inserted glitch causes error: MP→ MP´& M → M´ • Attacker computes: MÉ–C mod N (all public) • Reducing mod P gives non-zero MPÉ–MPE, but reducing mod Q gives zero correctly. • So MÉ–C mod N is divisible by Q but not P. • Thus Q = GCD(MÉ–C, N) which factorises N and reveals the secret key.

Shamir’s Fix • Message masking & exponent blinding make no difference: glitching CRT still works. • A correctness check must be made before output. • Counter-measure (Shamir): for random R (32 bits, say) • Compute MPR≡ CDmod PR; MQR≡ CDmod QR; • Check MPR≡ MQRmod R; • If so, output M ≡ (MPQA + MQPB)mod N where A, B satisfy QA ≡ 1 mod P, PB ≡ 1 mod Q • It is still possible (but harder) to insert a glitch into one of the summand computations in the CRT recombination. • As CRT speeds up computation 4–fold, one wants to be able to use it safely.

Passive Attacks • Assume correct computations. • SPA/SEMA: Instantaneous current/EMR reveals secret data. • DPA/DEMA: Differences between current values for two different trace sets reveals secret bits. • Traces are usually averaged over many executions for which the secret data is known to be constant. This reduces “noise” by factor of n for n cases. • DES: compare points where a selected key bit is used in i) the implementation being attacked; ii) an implementation with known key.

3DES Power Variation Switching Gates, wire capacitance etc cause power variation: The 16 rounds are clear. from Kocher, Jaffe Jun “Differential Power Analysis”

3DES Power Variation The variation within 7 cycles – the difference at cycle 6 is because a jump occurs in one, not the other from Kocher, Jaffe Jun, “Differential Power Analysis”

3DES Power Variation Correct Guess: Incorrect Guess: Incorrect Guess: from Kocher, Jaffe Jun “Differential Power Analysis”

DPA on RSA • Attacks on Square-and-Multiply: { To compute: M = CDfor bit representation D = dn –1dn–2…d1d0} M  1 ; For i  n–1downto 0 do Begin M  M2 mod N ; If di 0 then M  C× M mod N ; End • Every long integer operation is clearly visible in the power trace: data loading requires less power than multiplication. • Compare unknown secret key & known ref key on same implementation. • If squares and multiplies can be differentiated by DPA,the first most significant bit which is different causes a spike in the trace differences. The reference key bit can then be changed and the test run again. • Repeating until there is no difference, the secret key D can be recovered.

Square & Multiply Power Variation The time for data loading distinguishes Square from Multiply: S……M……S…… S…… S…… S…… M…… S…… M……S…… M…… S… …… 1…………… 0…… 0…… 0…… 1…………… 1…………… 1……………

DPA on RSA • m-aryexponentiation: { To compute: M = CD for repnD = dn–1dn–2…d1d0in base m} Pre-compute digit powers of C mod N ; M  1 ; Fori n–1downto0 do Begin M  Mm mod N ; Ifdi 0 then M Cdi× M mod N ; End • Improved efficiency. • Now it is insufficient to distinguish only squares and multiplies. Different non-zero digits give similar traces. • The space of keys giving similar traces has size (m–1)n(m–1)/m, which is computationally infeasible to search exhaustively.

Averaging and Blinding • Averaging improves the signal-to-noise ratio (SNR). But using exponentblindingD+r(N) for (32-bit) random r means that averaging destroys all useful information. • Message blinding increases the differences between target and reference crypto-systems, making it more difficult to apply DPA to recognise equal key digits: • C is first replaced by CR–E mod N for random R, • then (CR–E)D= MR–1 mod N is computed, • then M = (MR–1)R mod N. • More sophisticated data processing is possible than SPA or DPA: Averaging within a single exponentiation may reveal the secret key. So blinding the message andexponent maynotbe enough.

Montgomery Modular Multn • Montgomery Modular Multiplication (MMM): { Pre-condition: 0 A < N < rn, where r is base of representation. } P  0 ; For i  0 to n1 do Begin q  (p0+aib0)(-n0-1) mod r ; P  (P + aiB + qN)div r ; { Invariant: 0 P < N+B } End ; { Post-conditions: PrnA×B mod N , ABr–nP < N + ABr–n } If N ≤ P then P P–N ; { Post-condition: 0 P < N, P ABr–n mod N } • Note the time difference according to the value of the condition. • Note the introduction of the factor rn for word size r. • Çetin Koç will give more detail on the use of this algorithm.

Timing Attack: Square vs Multn Squares behave differently from Multiplications in avage times: The condition ABr–nP < N + ABr–n and the assumption that A, B and P are uniformly distributed enables one to calculate the probty for the product P exceeding N for i) squares, and ii) multiplications. Roughly, integrating over the ranges of A and B yields a coefficient ½×½ = ¼ for the multiplication, but ⅓ for the square. So there are fewer conditional subtractions for multiplications. • Example: for numbers 0 to 3, 4/16 = ¼ of all products exceed 3, but 2/4 = ½ of all squares exceed 3. • Using power traces from individual executions with the same key, a profile is constructed of average numbers of extra subtractions for each key bit when square-and-multiply is used. The secret key is thus revealed. • Can the attacker correct any errors he makes?

m-ary Exponentiation • In the case of m-ary expn, the m–1 different digits can be identified by determining which pre-computed multiplier Ci is used in each multiplication. This is done as follows: • For n power traces, each digit position j generates a vector containing 1 in position i if the ith trace has the extra subtraction, and 0 if it does not. • These vectors cluster together: vectors for positions j1 and j2 are close if, and only if, the same multiplier was used. (Distance is measured by a count of the number of bits which are different.) • The vectors for squares are not close to any others! • So the squares are determined and the places where the same key digit appears are determined. The pre-compns determine which digit belongs to each cluster group. • Thus the secret key can be recovered here too.

Counter-Measures • Several counter-measures are possible, e.g.: • Random blinding of the exponent to frustrate averaging. • Always perform the subtraction; select the new or previous result according to the sign after the subtraction. • Never perform any conditional subtraction! • Remarks: • each of these has a cost. • blinding doesn’t help if a single execution could be attacked (possible when power instead of timing is used). • DPA may detect which choice is made (e.g. by address) if the difference is always computed. • No conditional subtraction may lead to overflow.

No Conditional Subtractions • Constant Time Montgomery Modular Multiplication: { Pre-conditions: inputs s.t. 0 A,B < 2N, N < ¼rn for radix r } P  0 ; For i  0 to n-1 do Begin q  (p0+aib0)(-n0-1) mod r ; P  (P + aiB + qN)div r ; { Invariant: 0 P < N+B } End ; { Post-condition: output P such that 0 ≤ P < 2N} • Note the pre-condition bound relating N and n. • This ensures the last iteration reduces the output P by a factor sufficient to achieve the output bound. • The output now satisfies the same condns as the input, so can be used as input to the next MMM of the expn – overflow is avoided.

Oops! A DPA Attack…? • For standard key sizes, the key length is an integral multiple of the word length. So 0 P < 2N means P sometimes overflows into another word, which is always 0 or 1. Power analysis can separate these using their Hamming wts. • Could an attack like the timing one still reveal the key? • The number of iterations is minimal to ensure the outputs stay bounded below the same bound as the inputs: If n satisfies 4N < rn then MMM (with n iterations) preserves the same bound for I/O, namely ½rn. • So one extra iteration is required for standard key lengths (compared to version with conditional subtraction).

… but not Enough DPA Data • For standard key lengths & n as above, ½rn–1<N < rn–1, so there is an extra iteration replacing the conditional subtraction. • The extra iteration decreases the output so that overflow into the top word (that of index n–1) is very unusual. • It is so unusual that it can only occur if all bits of the top word of N are 1, i.e. if the top word is r–1. • If the top word is r–1 then the overflow still occurs so rarely that in the lifetime of the key there are insufficient overflows to enable a timing attack to proceed if a conditional subtraction is included when P ≥ rn–1. • Then, as usual, the output again has the same bit count as N.

Cost of MMM • In classical modular multiplication, the multiple of N for subtraction depends mainly on the most significant digit. This requires carry propagation to complete when adding the multiple of B to P. • MMM is easier to speed up in HW since the multiple of N depends only on the lowest digit – it does not need to wait for carry propagation to finish. • The extra iteration takes about the same time as performing the extra subtraction every time (since the top digit is almost always 0). It should be safer since there is no branching which DPA might attack.

Residual Variation? • Residual data-dependent variation is difficult to assess. Where will another researcher find the next chink in the armour? • There is no data dependent branching. However, there is a lot of data-dependent calculation where the data reveals the key. As outlined above, m-ary exponentiation re-uses the same multiplier whenever the same exponent digit appears. • Perhaps one might consider the Hamming weight of the “top” digit of the output from MMM (i.e. that of index n–2) to decide whether it is above or below ½rn–1…???

MMM in Exponentiation • In exponentiation, inputs are first MMMd with r2nmod N to add a factor of rn. MMM(A, r 2n) = Ar nThis is preserved by every MMM in the exponentiation scheme: MMM(Ar n, Br n) = ABr n The final output still contains the extra factor rn. This is removed by MMMing by 1: MMM(Ar n,1) = A. • In the version of MMM without conditional subtractions, the final MMM by 1 reduces the output from a bound of ½rn to a bound of N. So again no extra subtraction needs to be done!

Summary • Glitch attacks were seen to be dangerous: checks need to be included if CRT is used to speed up expn. • Conditionals causing timing variation in modular multiplications were seen to pose a threat in expn. • Conditionals can be removed without too much expense: an extra iteration in Montgomery ModrMultn. • I’ve not covered other modrmultn algorithms; there are a few, but MMM is the most widely applied because of its efficiency. • Always use message blinding and key blinding. • We don’t know if new algorithms & H/W will keep pace with new attacks, but keep an eye on the standard conferences:CHES, CT-RSA, SAC etc.

Bibliography I This is a list of key references which discuss the topics covered here in more detail. • P. Kocher, Timing Attack on Implementations of Diffie-Hellman, RSA, DSS, and other systems, Advances in Cryptology – Crypto '96, LNCS 1109, Springer, 1996, 104–113. • P. Kocher, J. Jaffe & B. Jun, Differential Power Analysis, Advances in Cryptology – Crypto '99, LNCS 1666, Springer, 1999, 388–397. • CDW, Data Dependent Power Use in Multipliers,17th IEEE Symposium on Computer Arithmetic, IEEE Press, 2005, pp 4-12. • J.-J. Quisquater & D. Samyde, ElectroMagnetic Analysis (EMA): Measures and Counter-Measures for Smart Cards, Smart Card Programming and Security (E-smart 2001), LNCS 2140, Springer, 2001, 200–210. • J.-J. Quisquater & D. Samyde, Eddy current for Magnetic Analysis with Active Sensor, Proc E-smart 2002, Nice, France, September 2002.

Bibliography II • Ross Anderson & Markus Kuhn, Tamper Resistance – a Cautionary Note, Proceedingsofthe Second USENIX Workshop on Electronic Commerce, Oakland, California, November 18-21, 1996, pp. 1–11. • S. Skorobogatov & R. Anderson, Optical Fault Induction Attacks, Cryptographic Hardware and Embedded Systems (Proc. CHES 02), LNCS 2523, Springer, 2002, pp. 2–12. • D. Boneh, R. DeMillo & R. Lipton, On the Importance of Checking Cryptographic Protocols for Faults, Eurocrypt '97, LNCS 1233, Springer, 1997, pp. 37–51. • A. Shamir, Method and apparatus for protecting public key schemes from timing and fault attacks, US patent 5,991,415, Nov 23, 1999. • J.-F. Dhem, F. Koeune, P.-A. Leroux, P. Mestré, J.-J. Quisquater & J.-L. Willems, A practical implementation of the Timing Attack, Proc. CARDIS 1998, LNCS 1820, Springer, 2000, pp. 175–190.

Bibliography III • T. S. Messerges, E. A. Dabbish & R. H. Sloan, Power Analysis Attacks of Modular Exponentiation in Smartcards, CHES 99, LNCS 1717, Springer, 144–157. • W. Diffie & M. E. Hellman, New Directions in Cryptography, IEEE Trans. Info. Theory, IT-22, no. 6 (1976), 644–654. • R. L. Rivest, Timing cryptanalysis of RSA, DH, DSS, Communication to sci.crypt Newsgroup, 11 Dec 1995. • CDW & S. Thompson, Distinguishing Exponent Digits by Observing Modular Subtractions, CT-RSA 2001, LNCS 2020, Springer, 2001, 192–207. • CDW, Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli, Topics in Cryptology – CT-RSA 2002, LNCS 2271, Springer, 2001, 30–39. • D. Agrawal, B. Archambeault, J. R. Rao & P. Rohatgi, The EM Side-Channels, CHES 2002, LNCS 2523, Springer, 2002, 29-45. • CDW,Longer Keys may facilitate Side Channel Attacks, SAC 2003, LNCS 3006, Springer, 42-57.

Side Channel Leakage from Implementations of Modular Multiplication

Side Channel Leakage from Implementations of Modular Multiplication

Presentation Transcript

Side-Channel Attack Pitfalls

Protecting Circuits from Leakage

AES Side Channel Attacks

Side Channel Attacks

Algebraic Side-Channel Attacks Beyond the Hamming Weight Leakage Model

G_200 Side-channel-blowers

IMPLEMENTATION OF MULTIPLE-PRECISION MODULAR MULTIPLICATION ON GPU

They bind to the channel from the inner side of the membrane.

Is there Safety in Numbers against Side Channel Leakage?

Side-Channel Attack: timing attack

Bighorn River Side Channel Study

M IST A Randomized Exponentiation Algorithm for Reducing Side Channel Leakage

Montgomery Modular Multiplication

Incidence of Leakage

Overview of implementations

Area-Time-Efficient Montgomery Modular Multiplication

Fast Modular Multiplication using Parallel Prefix Adder

Randomised Algorithms for Reducing Side Channel Leakage

Side-Channel Attacks

Recovering Secret Keys from Weak Side Channel Traces of Differing Lengths

Side Channel Leakage from Implementations of Modular Multiplication

Is there Safety in Numbers against Side Channel Leakage?