Memory Consistency Models in Wide-area Storage System – Or What Do They Mean?

Memory Consistency Models in Wide-area Storage System – Or What Do They Mean? CS258 Spring 2002 Mark Whitney and Yitao Duan

Motivations • Global scale computing approaching • Wide area storage is becoming a reality • The greed for processing power calls the marriage of the two • Traditional approach to large scale data processing: Hierarchy • What if new algorithms require to touch more data? Scale SMP? • Use OceanStore as testbed

Data and Computation Hungry Applications • Quantum Chromodynamics • Biomolecular Dynamics • Weather Forecasting • Cosmological Dark Matter • Biomolecular Electrostatics • Electric and Magnetic Molecular Properties

~PBytes/sec ~100 MBytes/sec Offline Processor Farm ~20 TIPS There is a “bunch crossing” every 25 nsecs. There are 100 “triggers” per second Each triggered event is ~1 MByte in size ~100 MBytes/sec Online System Tier 0 CERN Computer Centre ~622 Mbits/sec or Air Freight (deprecated) Tier 1 FermiLab ~4 TIPS France Regional Centre Germany Regional Centre Italy Regional Centre ~622 Mbits/sec Tier 2 Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS Caltech ~1 TIPS Tier2 Centre ~1 TIPS Tier2 Centre ~1 TIPS HPSS HPSS HPSS HPSS HPSS ~622 Mbits/sec Institute ~0.25TIPS Institute Institute Institute Physics data cache ~1 MBytes/sec 1 TIPS is approximately 25,000 SpecInt95 equivalents Physicists work on analysis “channels”. Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Pentium II 300 MHz Tier 4 Physicist workstations Data Grid for High Energy Physics - CalTech

Background • What is OceanStore? • A global persistent data store scalable to billions of users • High availability, fault-tolerance, security • Caching to reduce network congestion, guarantee availability and performance • Flexible consistency semantics • Observations and questions • Remarkable resemblance to MP memory system • Replica = cache, client = processor, data object = memory item • OceanStore consistency semantics are typically that of a file system’s. What do they mean to a program?

Running Parallel Applications on OceanStore • Why do we try this • Distributed computing • Grid • World Wide Computing • New programming paradigm? (OceanStore is a new phenomena, will it bring out new applications? Where will computing infrastructure go given the advance of network, storage and parallel processing?

ParaApp ParaApp OS Kernel OS Kernel ParaApp OS Kernel OClient OClient OClient Shared Virtual Memory Space

Running SMP Apps on OceanStore • OceanStore data objects are globally identified • Virtual address in application address space mapped to OceanStore object ID • Shared memory address access turned into OceanStore requests

Consistency Models

Performance Evaluation • OceanStore …(#of inner rings, …) • Nachos++! • MIPS R3000 processor w/FP • Stanford SPLASH-2 benchmark suite • 4 x 4 matrix LU decomposition

Computation Time Number of Cycles

Network Latency Milliseconds

Open Questions • Programming model • Cache policy • Consistency models • Sharing granularity

Conclusion and Future Work • Matrix decomposition runs on OceanStore! • Wide-area distributed synchronizations are expensive (not surprising) • Need better memory model if want to run shared memory applications • Message passing? – Seems to be a better match (use explicit OceanStore APIs) • New programming model?

Memory Consistency Models in Wide-area Storage System – Or What Do They Mean?