High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation

High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation Weidong Shi Hsien-Hsin (Sean) Lee Mrinmoy Ghosh Chenghuai Lu Alexandra Boldyreva School of Electrical and Computer Engineering Georgia Institute of Technology

Counter VAddr Counter Vaddr+2 Key Key AES Block Cipher AES Block Cipher Counter+2 Counter+1 VAddr VAddr Counter+2 Counter+1 Vaddr+2 Vaddr+2 Encryption pad Encryption pad 16B Cache Line 16B Cache Line Counter+2 Counter+1 Counter Encrypted 16B Encrypted 16B Counter Mode Encryption • Each memory line has its own counter. • Each time memory line is updated, increment the counter.

Counter+2 Counter+2 AES Block Cipher AES Block Cipher Decryption pad Decryption pad 16B Cache Line 16B Cache Line Encrypted 16B Encrypted 16B Counter Mode Decryption VAddr Vaddr+2 Key Key • Counter has to be fetched for memory line missing L2.

Related Work • Use dedicated cache (sequence number cache) to reduce latency overhead of memory decryption (MICRO-36) • Prefetch based memory pre-decryption (WASSA 2004) • Prediction based memory decryption (this paper) • Fully exploit pre-computation capability enabled by counter mode encryption. • Use wasted idle crypto engine pipeline stages for prediction and pre-computation. • Less area overhead than caching and less memory pressure than prefetch based pre-decryption.

frequently updated data infrequently updated data Counter Prediction • Counters exhibit both spatial and temporal coherence. • To exploit spatial coherence, memory blocks from the same page start counting from the same initial value (page root counter) counter static data

Use Free Idle Pipeline Stages for Prediction Time Line AES Pipeline Memory Pipeline decrypted line • Unrolled and pipelined AES decryption logic often stays idle for tens to hundreds of cycles during L2 miss.

Use Free Idle Pipeline Stages for Prediction • Use the idle pipeline stages to generate decryption pads based on • predicted counter values (a small window of look ahead counter • values based on page root counter number) E(K,G4) correct guess Time Line AES Pipeline Memory Pipeline decrypted line

Page Root Counter Prediction History Vector Page Base (64 bits) (16bits) Addr ... 0xabcddcba12344321 0x0000ff00 ... ... ... ... ... ... TLB Counter Value Prediction Logic 0 0 0 1 0 0 1 0 0 1 1 0 0 0 0 0 1 Handle Frequent Updates Prediction Miss/Prediction Hit (miss =1, hit = 0) If total(miss)>threshold, reset the corresponding Page Root Counter to a new number

Experimental Parameters • Simplescalar 3.0 • SPEC2000 INT/FP, benchmarks with high L2 misses. • Prediction hit rate study (8 billion instructions) • IPC performance (400 million on representative window)

Prediction Rate • Prediction hit rate under 8 billion instructions • No counter number cache when using prediction • Prediction depth = 5 • Average prediction hit rate, about 82-83%

IPC • IPC normalized with the scenario without decryption. • In general, outperform 128K counter cache • On average, in par with 512K counter cache

Improve Prediction Accuracy • Two-level prediction • divide prediction depth into sub ranges • decide prediction range at first level • then make predictions in the range • Context based prediction • exploit temporal coherence of accessing memory locations with coherent update frequency

00 01 10 11 Prediction Window Prediction Window Prediction Window Prediction Window Two-level Prediction Counter Number In Natural Order • Divide prediction window into ranges (power of 2) • With 2bits per line, effectively quadruple the prediction depth. • Overhead is about 2KB on chip memory for 64-entry TLB.

Prediction Window Context Based Prediction Counter Number In Natural Order • Store the previous line’s counter number depth value in a global register. • Generate new predictions based on Page Root Counter and the value in Context Register. • Can be combined with regular and 2-level predictions. Feed all the predictions into the decryption pipeline.

Why Does It Work? { while (1) { for all lines of the page write to the line; for all lines of the page read the line; } } Memory Page (128 lines)

Prediction Rates • 8 billion instruction window • Two-level prediction about 93% prediction hit • Context based + regular prediction almost 99% prediction hit

IPC • IPC normalized to scenario of no decryption • 1-3% loss of performance using best prediction

Conclusions • Counter value prediction allows pre-computing of pads speculatively without counter value caching. • Spatial and temporal coherence of memory update frequency enables effective counter value prediction. • Use idle cycles of pipelined decryption engine • Counter prediction achieves better performance than some of the large cache settings. • Complementary with caching technique

Questions

High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation

High Efficiency Counter Mode Security Architecture via Prediction and Pre-computation

Presentation Transcript

Security Architecture and Models

Security Architecture and Models

Security Models and Architecture

Parallel Computation Architecture, Algorithm and Programming

Architecture and Mode of Operation

Model Evaluation and Selection via Prediction

PRE-ROMAN ARCHITECTURE

Pre-population mode retrospective

Distributed Computation of the Mode

Spherical Earth mode and synthetic seismogram computation

High Throughput Zero Mode Waveguides via Natural Lithography

Galois/Counter Mode (GCM)

Multiparty Computation with Low Communication, Computation and Interaction via Threshold FHE

Security Architecture and Analysis

Real Mode and Protect Mode Architecture

Security Architecture and Design

Robust Counting Via Counter Braids: An Error-Resilient Network Measurement Architecture

Mode-detection via Median-shift

Protected Mode Architecture

Security Architecture and Design

Model Evaluation and Selection via Prediction

Pre Roll Counter Boxes