Cache Attacks and Countermeasures: the Case of AES

Cache Attacks and Countermeasures: the Case of AES Dag Arne Osvik, Adi Shamir and Eran Tromer Presented by Ophir Arbiv ophirarb@post.tau.ac.il

[1] Cache Attacks and Countermeasures: the Case of AES (Extended Version),2005, Dag Arne Osvik, Adi Shamir and Eran Tromer. [2] theory.csail.mit.edu/~tromer/SKC2006/cache-skc06.ppt – Tromer’s lecture in MIT. [3] www.l-sec.be/calit/present/AdiShamir.pdf - Adi Shamir’s lecture in Weizman Inst. Sources

AES – Advanced Encryption Standard • 1997 - DES becoming outdated NIST announces competition to design a successor. • Evaluation criteria - Security, Cost, Algorithm & Implementation Characteristics • 21 Algorithms were received. In 2001 - NIST selected Rijndael as the proposed AES algorithm. • Rijndael was proposed by Dr. Vincent Rijmen and Dr. Joan Daemen from Belgium • Properties: • Symmetric • Block Cipher • Based in finite mathematics • 128 bit Data and Key size of 128, 192 and 256 bits. • Resistant to known attacks.

AES Algoritrhm • The mathematical description of the algorithm: Source: http://klabs.org/mapld05/presento/103_swankoski_p.ppt

Tables: Key: = Round implementation: Efficient Implementation • Originally proposed in the Rijndael spec, and is now widely used. • Uses pre-computed table lookups. • Each round - 16 table lookups, 16 xor’s, and 12 shifts. • .Tables occupy – 4 KB (X2)

AES - summary • During AES selection, only branch statements, arithmetic, and data-dependent shift were considered vulnerable. • Proposed Algorithms was widely analyzed. • Apparently, since it uses only table lookup, xor & shift, NIST declared Rijndael “not vulnerable to timing attacks. • 2003 - NSA declared AES-128 can be used to protect all US Government data except Top Secret data which needs AES-256 (at least). • No known direct attacks as for today. • Expected to be the standard for 20+ years.

Side Channels • Any observable information emitted as a byproduct of the physical implementation of the cryptosystem. K Plaintext Side Channels Cipher Ciphertext Source: www.stanford.edu/~jbonneau/AES_side_channel.ppt

Examples for side-channels : • Power consumption (simple, differential…) • Time • Heat • Acoustic Noise (Keyboards..) • Cache • Fault (power glitch, jitter..) • Electromagnetic radiation • Visual Side Channels

cache →timing gap Typicallatency: 0.3ns 50-150ns Why Cache Analysis? CPU core60% (until recently) Annual speedincrease: Main memory7-9%

The cache is a shared resource.=> cache state affects and affected by all processes. => possible crosstalk between processes. Process memory is usually protected but… Information about memory access patterns of other processes is leaked. Cache attacks are pure software attacks. Very cheap. A process with no special privileges & no interaction with the cryptographic code (some variants) can attack the cryptographic code. Cache Attacks

memory block(B bytes) DRAM cache set(W cache lines) cache line (B bytes) cache How Cache Works? • The cache holds copies of aligned blocks of B bytes in main memory (blocks). • When a memory access instruction is processed, memory cell is searched in the cache first. • If a cache miss occurs, a full memory block is copied into the appropriate set (S possible sets) into one of the W cache lines. Memory Access Cache

How Does a Cached Table Look Like? S-boxtable DRAM cache

Notation • δ – the cache line size B divided by the size of each table entry (usually 64/4 =16). • <y> = the memory block of y in Tl. <y> = <z> iff when used as lookup indices into the same table T`, they would cause access to the same memory block • Qk(p,l,y) = 1 - iff the AES encryption of the plaintext p under the encryption key k accesses the memory block of index y in Tl at least once (during the 10 rounds).

Cache Attacks on AES • The efficient implementation of the algorithm has a big weakness: The lookup addresses strongly rely on the encryption key ( The Secret). • Therefore, by knowing which memory cells were accessed we can extract the key (suppose a BUS attack). • Usually the attacker doesn’t have access to the BUS and the memory is partitioned and protected by the OS. • The Solution : The cache is a shared resource through which we can learn about the memory access patterns of other processes.

Synchronous Attacks • The plaintext or cipher-text is known • The attacker can operate synchronously with the encryption (on the same processor). • Examples: • sending data packets through a secure channel in a VPN. • Linux’s dm-crypt and cryptoloop services. • The Attack Scheme • Obtain a set of random samples, Mk(p,l,y) of the predicate Qk(p,l,y). • Perform off-line cryptanalysis: • Guess small parts of the key. • Use the guess to predict memory accesses. • Check whether the predictions are consistent with the collected data.

Consider one of the memory accesses in the 1st round: T0[p0  k0] Given a candidate value k’0 and samples of Q(p,l,y): The useful samples are those that fulfill: p0  k’0y If k’0k0 then for all useful samples: p0  k0 p0  k’0 y so T0[p0  k0] accesses address y => Q(p,l,y)=1 Otherwise: p0  k0 p0  k’0y => Q(p,l,y)=0 But there are 35 more “random” accesses to T0… with probability (1-1/16)350.104 A few hundred (!) random samples suffice to eliminate all bad candidates. High nibble of all key bytes (log2(256/ δ)) are extracted (64 bits). One Round Attack

Full Key Extraction • We managed to narrow down each byte of the key to δ possibities, with a straightforward method. (in the common case it means extracting half the key - 64 bits) • This is all the possible information from 1st round accesses. • By moving to 2nd round and taking advantage of the non-linearity of the S-box we can extract the full key!!

Two Round Attack • These equations for the 2nd round are easily derived from the Rijndael specification: { s(·) denotes the Rijndael S-box function and • denotes multiplication over GF(256).} • is used as an index to T2. • The only relevant unknowns in the index are the low nibbles of k0,k5,k10 and k15 (216 candidates). • Can test a candidate as before: • Predict this lookup according to guess {k’0,k’5,k’10, k’15} (lower nibble k2 irrelevant). • Identify useful samples, i.e., those where y is in the same memory block as the prediction • Check whether Q(p,l,y)=1 for all useful samples. • There are 3 more accesses of this special form, with disjoint sets of relevant low nibbles. => full key recovery using ~2000 random samples.

How do we obtain the measurements Mk(p,l,y) of predicate Qk(p,l,y) ?? Inter-process crosstalk can be exploited in two ways: Effect of the cache on the encryption (timing). Effect of the encryption on the cache. Measurement Methods

Attackermemory Measurement Method 1: Evict + Time 1. Make sure the tables are cached 2. Evict one cache set T0 DRAM 3. Time an encryption and see if it’s slow cache

Results • Weakness of this method: • It relies on timing the triggered encryption => it is very sensitive to variations in the operation (noise due scheduling, branches, cache contention and ect.) • The authors were able to extract key only from artificial service (using OpenSSL libs) but not from real services.

Attackermemory Measurement Method 2: Prime + Probe • Trying to discover the set of memory blocks read by the encryption a posteriori, by examining the state of the cache after encryption. 1. Completely evict tables from cache 2. Trigger a single encryption S-boxtable 3. Access attacker memory again and see which cache sets are slow DRAM cache

Results • Yields more information (4 · 256/ δ) from a single encryption • Not a timing attack! Attacker is timing a simple operation performed by itself! • Insensitive to timing variance in encryption code path (crucial for effective attacks on complicated systems). • No real need to trigger the encryption – can wait until it happens by itself… :

Synchronous Attacks - summary • For a known plain-text & sync. attacker • Two Measurement methods. • Results: • OpenSLL libs on Athlon 64: • Evict + Time – 500,000 encryptions. (why?) • Prime & Probe – 300 encryptions, (16K on P4E). • Real Linux dm_crypt: • Prime & Probe – 800 write operations – 65 ms + 3 sec offline analysis. • Variants …

Asynchronous Attack • Someone runs encryptions computations using a secret key. • Attacker process runs on the same CPU at (roughly) the same time. • Assume the plaintext/ciphertext has a non-uniform (conditional) distribution: • English • Formatted data • Headers • Ciphertext gleaned from wire • Examples: just about any use of crypto on a multi-user system Finding the key • Compare two distributions: • Measured memory accesses statistics. • Predicted memory accesses statistics, under the given plaintext distribution and the key hypothesis. • Find key that yields best correlation

Countermeasures • The authors consider numerous countermeasures e.g.: • Avoiding Memory Accesses • Alternative Lookup Tables • Data-Oblivious Memory Access Pattern • Cache State Normalization and Process Blocking • Disabling Cache Sharing • Static or Disabled Cache • Dynamic Table Storage • Hiding the Timing • None of the them solves the problem completely. Some are architecture/application dependant or require changes in the system. • None are both secure, efficient (or cheap) and generic. => Case specific solutions – probably a combination of the methods.

Thank you! Questions?

Homework • What is the difference between Evict+Time and Prime+Probe measurement methods. • In the case of known cipher-text, how would the attack change? (hint: can be more efficient – see paper) • Why does a first round synchronous attack able to extract only half the key bits? (on a δ=16 platform) • Does the addition of random delay to the encryption algorithm improve the immunity against synchronous attacks? Why?

Cache Attacks and Countermeasures: the Case of AES