Optimal Core and Cache Configuration for 16-Processor Systems in Web Applications

Cores vs. Caches CS 838 Project Matt Ramsay & Chris Feucht

Motivation • As feature sizes push smaller, additional hardware can be placed on chip • Various trade-offs result • Among these for a CMP is how many cores and how much cache on each chip • Our project results suggest an optimal configuration for a 16-processor system running web-based applications

Outline • Motivation • Experiments Performed • Simulator Environment • Results • Project Shortcomings • Future Work • Conclusions & Summary

Experiments • Intended experiments not performed due to simulator limitations • Intended experiments: Each core equivalent to .5 MB L2 cache • Ran apache_8, oltp_2, zeus_8

Simulator Environment • All nodes include 32 KB, 2 way L1 I & D caches • Each nodes has its own L2 bank, regardless of L2 size or assoc. • All other ruby and opal settings left at default

Results - Apache

Results - OLTP

Results – Zeus

Project Shortcomings & Future Work • Longer runs needed for convincing data • Test different number of processors/system • Add L3 cache to memory hierarchy

Conclusions • CPI (IPC) changes little in a 16-processor system as number of cores/chip varies • This happens despite rapid system-wide L2 cache growth with added chips • Best performance per cost is with all 16 processors on one chip • Even with 2 MB total L2 • Would be helped by off-chip L3

Project Summary We look here! 50 miles

Optimal Core and Cache Configuration for 16-Processor Systems in Web Applications

Optimal Core and Cache Configuration for 16-Processor Systems in Web Applications

Presentation Transcript

Caches

Caches

Cores vs Clock Speed

Cores, cores, everywhere

Caches

Caches

Caches

Caches

Caches

Caches

Caches

Cores

Caches

Caches

CORES

Caches