200 likes | 324 Vues
This document investigates the intricate vulnerabilities present in high-end microprocessors, particularly through timing side-channel attacks on cryptographic algorithms such as AES and RSA. It emphasizes the growing complexity of microarchitectures, which may inadvertently expose cryptosystems to exploitation. The study details how execution time is influenced by the microarchitecture's state, including speculative instructions and cache behavior. Various attack methods are discussed, highlighting the feasibility of recovering encryption keys by leveraging timing discrepancies in instruction execution. Countermeasures and optimizations are also briefly mentioned.
E N D
Vulnerabilities on high-end processors André Seznec IRISA/INRIA CAPS project-team
A paradox • Microarchitectures are more and more complex • Timing side channel attacks were presented on versions of AES (Bernstein) and RSA (Açiimez et al.)
Many hardware features only to improve performance • Caches • Pipeline • Superscalar execution • Branch prediction • Thread parallelism
hit miss DTLB hit miss Branch Predictor ITLB hit miss hit miss Execution core D-cache Correct mispredict I-cache L2 Cache hit miss Execution time of a short instruction sequence is a complex function !
Execution time of a short instruction sequence is a complex function (2) • Depends on the precise state of every microarchitecture component: • More than 100 speculative instructions inflight at the same time on a Pentium 4 • Instructions are executed out-of-order. • Strange correlations almost impredictable at compile time (even in the back-end compiler)
Understanding AES cache timing attack on high end microprocessor (follows Bernstein2005) • AES with lookup tables is a 10 round algorithm with the following “vulnerabilties” • The number, the types and the order of the instructions are independent of the key K and the message M to be encrypted. • The exact locations of the data word read and written by the first round only depend on K xor M: • The execution time of the first round depends on K xor M (at least statistically) CAN BE EXPLOITED
Bernstein 2005 (empty cache) • Plaintext attack • Irrealistic hypothesis: • Access to cycle-accurate encryption timing • Cache is flushed between two encryptions • Not explicit in the paper (but see Lauradoux et al.) • Byte by byte determination of the key based on statistically determining the maximum encryption time for each byte of K xor M • works only on Pentium 3, not on Pentium 4
A loaded cache attack (proof of concept codes available) • Plaintext attack: • Timing of large number of encryptions • An irrealistic hypothesis: • Access to cycle-accurate encryption timings On a byte basis of K xor M, determine bit subchains statistically leading to the highest encryption time (+ threshold to get confidence) Depending on microarchitectures: • 0 to 80 bits of the key recovered by this method depending on the model and stepping of Pentium 4 • Suspect exercising banking in the cache
First vulnerability • For given sequence, • Timings are erratic: • Unlikely to get exactly the same timing • But statistically correlated: • cache banking, operation chaining appears in the average
A possible counter measure for AES • Periodically and randomly change the mapping of the look up tables: • 9000 cycles for this change: XOR based permutation: • See Lauradoux et al • HAVEGE can provide the random numbers.
Indirect timing measures ? • Hypothesis: • The attacker has access to user mode on the system (legal or illegal) • The attacker has no access to your data • He/she can run concurently its process with the encryption • On conventional systems, no access to microscopic timing of your application: • Time slice in 1,000,000s cycles
Simultaneous Multithreading (SMT): parallel processing on a single processor • functional units are underused on superscalar processors • SMT: • Sharing the functional units on a superscalar processor between several process • Advantages: • Single process can use all the resources units • dynamic sharing of all structures on parallel/multiprocess workloads Second Vulnerability
SMT Superscalar Issue slots
Indirect timing measures on a SMT processor (principles) SPY wants to get information on CRYPT • SPY and CRYPT runs in parallel • SPY tracks a specific event on CRYPT: • For instance execution of a branch • SPY saturates hardware resources needed for this event by CRYPT for fast execution • SPY records its own execution time (reading the hardware clock counter): • Irregurality in its own execution time signals the event: • CRYPT has try to grab the hardware resource
Indirect timing measures on a SMTproof of concept (derived from SBPA) The skeleton of a naive RSA core For I =1 to N Sequence X // 1,000s of cycles If Key[I]=1 Sequence Y // 1,000s of cycles Endfor Spy this branch B
Indirect timing measures on a SMTproof of concept (2) • Branch instructions are buffered in a BTB: • On Pentium 4, when the branch misses in the BTB, more than 20 cycles penalty • SPY: nearly infinite loop iterating on branching over a set of branches occupying the possible entries for B • Track irregularities in the timing of the loop: • When B is executed, a branch of the SPY is ejected from the BTB, thus creating a timing irregularity: • Iteration is X-type or XY-type Able to reproduce this attack on a toy example
Indirect timing measures on a SMT • Feasible: • On a branch on Pentium4 HT, information is leaking: • I recovered all the bits of 32 bits key in a single run (on a toy example) • Same kind of attack may apply for cache access: memory access sequence could be discovered
Feasible, but difficult • Technically, very difficult: • Lack of documentation on the BTB • Strange indexing, unknown associativity, BTB hierarchy • Requires relatively infrequent events: 1,000s cycles frequency: measure resolution is in the 100s cycles resolution
So what ? • On Pentium 4 HT: • If key bits control branches (or addresses of loads): • Might be recovered by a spy thread
Countermeasures • Just deactivate Hyperthreading. • At present that is a global OS mode (boot time) • Rework implementation: • Introduce randomness in control path at execution ? • Makes attack much more complex