Cache-Collision Timing Attacks Against AES

Cache-Collision Timing Attacks Against AES Joseph Bonneau Stanford University jbonneau@stanford.edu Ilya Mironov Microsoft Research mironov@microsoft.com

Side Channel Cryptanalysis • Definition: Any attack on a cryptosystem using information leaked given off as a byproduct of the physical implementation of the cryptosystem, rather than a theoretical weakness. • Exploitable side-channels: • Power usage • Cache accesses • Noise • Heat • Time

Brief History of Timing Attacks • Timing attacks consider variability in the time taken to perform an encryption due to secret data. • Paul Kocher demonstrated timing attacks against Diffie-Hellman, RSA, DSS, etc. at CRYPTO ’96 • Dan Boneh, David Brumley demonstrate first remote timing attack against RSA in 2003 • Public Key systems are vulnerable due to their use of lengthy mathematical operations

Brief History of Timing Attacks • During AES competition, timing attacks were only believed to be possible against branch statements or data-dependent rotations. • Rijndael has a mathematical formulation in the field GF(28) • Optimized Rijndael implementation in software use only table lookup, shift, and exclusive-or operations • NIST declared Rijndael “not vulnerable to timing attacks” in it final evaluation in 2000, Rijndael wins competition.

Brief History of Timing Attacks • Daniel Bernstein announces successful timing attacks against AES in April 2005, exploiting timing characteristics of table lookups • Osvik, Shamir, Tromer, follow up in November 2005 with very powerful attacks, requiring direct observation of cache before and after encryption

Implementation details of AES, part I • The textbook description of an AES round as a function from (Xi, Ki)  Xi+1:

Implementation details of AES, part I • The actual round computation in software, as proposed with Rijndael and now widely used: • All three operations are combined into pre-computed tables. A round of encryption requires just 16 table lookups, 16 xor’s, and 12 shifts.

Bernstein’s timing attack Notice that for the first round, the table lookup indices are each related to only one key byte and one plaintext byte: Remarkably, the entire encryption time will be affected by just the value of

Bernstein’s timing attack To prepare for the attack, collect a large body of reference timing data for each

Bernstein’s timing attack Next, collect a large body of timing data from a target machine for the plaintext byte

Bernstein’s timing attack The target machine’s timing data should be shifted from the reference data by exactly

Bernstein’s timing attack • Problems: • The reference machine must be identical to the target • Requires known plaintext as well as timing data • Plaintexts must be sufficiently random • High number of samples required, best case as reported by Bernstein is around 227.5

Bernstein’s timing attack • Overall, a very general statistical method to constructing a timing attack. • Getting code to run in constant time on a machine with cache is very difficult, meaning most cryptosystems are theoretically vulnerable. • Bernstein’s attack doesn’t exploit any specific features of Rijndael, yet the attack does not seem to work against other AES finalists (Serpent, Twofish)

Cache-collision timing attacks What is Rijndael’s weakness?

Cache-collision timing attacks • What is Rijndael’s weakness? • Heavy use of table lookups which dominate the running time • Table lookup indices are easily related to single plaintext and key bytes

Cache collisions • Rijndael is just a sequence of table lookups. … T[x] T[x] T[xi] T[x] T[x] T[xj] T[x] …

Cache collisions • Rijndael is just a sequence of table lookups. • What happens when xi= xj? … T[x] T[x] T[xi] T[x] T[x] T[xj] T[x] …

Cache collisions • Rijndael is just a sequence of table lookups. • What happens when xi= xj? • The access to xj will hit in cache. … T[x] T[x] T[xi] T[x] T[x] T[xj] T[x] …

Cache collisions • Rijndael is just a sequence of table lookups. • What happens when xi= xj? • The access to xj will hit in cache. • What happens when xi≠ xj? … T[x] T[x] T[xi] T[x] T[x] T[xj] T[x] …

Cache collisions • Rijndael is just a sequence of table lookups. • What happens when xi= xj? • The access to xj will hit in cache. • What happens when xi≠ xj? • The access to xj may or may not hit in cache, depending on the rest of the sequence and the prior cache contents. … T[x] T[x] T[xi] T[x] T[x] T[xj] T[x] …

Cache collisions A cache-collision occurs when we know that xi= xj. For a large number of samples, the average encryption time will be lower when xi= xj than when xi≠ xj. This is all we need to build an attack. … T[x] T[x] T[xi] T[x] T[x] T[xj] T[x] …

Actual Results, Pentium III Cache collisions

First Round Attack Pick two lookups in the first round of encryption:

First Round Attack Pick two lookups in the first round of encryption: Solve for the collision constraint:

First Round Attack Result: A working attack! There is an easily identifiable low average encryption time whenever

First Round Attack Result: A working attack! There is an easily identifiable low average encryption time whenever However, there are some complications…

Complication #1: Table families Notice four separate tables are used: Each “family” of four bytes is isolated.

Complication #2: Cache Lines Modern memory is cached in lines.

Complication #2: Cache Lines Modern memory is cached in lines. Table Lookup

Complication #2: Cache Lines Modern memory is cached in lines. Table Lookup Cache

Complication #2: Cache Lines So, we can only tell if two lookups hit the same line in memory, not if they are identical. We denote: Most CPU’s use 32 or 64 byte cache lines. With 4 byte table entries, this means we are forced to ignore the 3 or 4 low-order bits.

First Round Attack: The bad news We gain a set of equations in each family, such as: This leaves 68 or 80 bits of key to search. This limitation was also problematic for Osvik et al. Their solution: examine the second round as well. This can fix some of the problems but is difficult for timing attacks (see paper).

First Round Attack: The good news • Cache-collisions are a strong method. • The timing variability is much better than the random effects previously used. • The attack requires ~215 samples, compared to 227.5. • Can we recover the full key with this efficiency?

Implementation details of AES, part II The final round of encryption is special round 1 round 2 … special! round 8 round 9 round 10

Implementation details of AES, part II • The final round of encryption is special: • No MixColumns operation is performed, as it would add no additional security • In software, this requires a new table to be used only for the final round. This table is just the S-box

Implementation details of AES, part II • The final round also uses expanded key bytes • However, the AES key schedule is invertible. Finding the final 16 bytes is equivalent to finding the raw key. This design was intentional.

Final Round Attack • Again, we consider a cache-collision for two bytes • When do these bytes collide in the table?

Final Round Attack We want to solve for

Final Round Attack We want to solve for We assume that

Final Round Attack We want to solve for We assume that , leaving

Final Round Attack So, guarantees a collision What happens if ?

Final Round Attack So, guarantees a collision What happens if ? We get a fixed offset

Final Round Attack So, guarantees a collision What happens if ? We get a fixed offset Surprise: the non-linearity of the S-box enables the attack to succeed.

Final Round Attack Why does this happen? Because α, β, are the result S-box lookups, a fixed offset does not mean anything about the indices used to look them up. A small offset γ = 1 does not mean a collision on the same cache line. Thus, the cache-line issue is gone.

Collect timing data, compute average time for each value of for all i, j. Low times will occur at the values • Attack data produces likelihood estimate for different values for each ki, kj. • Need to find k0,…,k15 minimizing the global cost function: ij Cij(ki, kj) • Use standard AI algorithms (Local Optimization, Belief Propagation). Final Round Attack

Final Round Attack: Results • Huge improvement over the original 227.5. • “Offline” complexity is low, attack takes seconds. This can be increased to further lower number of samples required.

Expanded Final Round Attack • Produce cost estimate for specific values of key bytes, instead of simply their difference • Require more time, memory by attacker, but attack still finishes in ~10 minutes

Final Round Attack: Results • Bonuses from attacking the final round: • Attack requires only ciphertext and timing. • Related plaintexts produce essentially random cipher state by the 9th round. • Attack is oblivious to the target platform • Attack works well against decryption

Cache-Collision Timing Attacks Against AES

Cache-Collision Timing Attacks Against AES

Presentation Transcript

SSH Timing Attacks

Cache-Collision Timing Attacks Against AES

Internet Cache Pollution Attacks and Countermeasures

Timing Attacks to RSA

Protection Against Bomb Attacks

AES Side Channel Attacks

Attacks Against Tor

Timing attacks on MAC verification

Timing-Predictability of Cache Replacement Policies

SSH Keystroke Timing Attacks

Timing Analysis of Keystrokes and Timing Attacks on SSH

Remote Timing Attacks are Practical

Exploiting Cache-Timing in AES: Attacks and Countermeasures

Remote Timing Attacks

Timing Attacks

Energy-Security Tradeoff in a Secure Cache Architecture Against Buffer Overflow Attacks

Cache Attacks and Countermeasures: the Case of AES

Tor and Timing Attacks

Selective DFT Attacks against E0

Remote Timing Attacks are Practical