150 likes | 242 Vues
Literature Review. Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan L. Miller IEEE/NASA Goddard Conference on Mass Storage Systems and Technologies April 2004. Henry Chen September 24, 2010. Introduction.
E N D
Literature Review Interconnection Architectures for Petabye-Scale High-Performance Storage Systems Andy D. Hospodor, Ethan L. Miller IEEE/NASA Goddard Conference on Mass Storage Systems and Technologies April 2004 Henry Chen September 24, 2010
Introduction • High-performance storage systems • Petabytes (250 bytes) of data storage • Supply hundreds or thousands of compute nodes • Aggregate system bandwidth >100GB/s • Performance should scale with capacity • Large individual storage systems • Require high-speed network interface • Concentration reduces fault tolerance
Proposal • Follow high-performance computing evolution • Multi-processor networks • Network of commodity devices • Use disk + 412port 1GbE switch as building block • Explore & simulate interconnect topologies
Commodity Hardware • Network • 1Gb Ethernet: ~$20 per port • 10Gb Ethernet: ~$5000 per port (25x per Gb per port) • Aside: Now ~$1000 per port • Disk drive • ATA/(SATA) • FibreChannel/SCSI/(SAS)
Setup • Target 100GB/s bandwidth • Build system using 250GB drives (2004) • 4096 drives to reach 1PB • Assume each drive has 25MB/s throughput • 1Gb link supports 23 disks • 10Gb link supports ~25 disks
Basic Interconnection • 32 disks/switch • Replicate system 128x • 4096 1Gb ports • 128 10Gb ports • ~Networked RAID0 • Data local to each server
Fat Tree • 4096 1Gb ports • 2418 10Gb ports • 2048 switch to router(128 Sw × 8 Rt × 2) • 112 inter-router • 256 server to router (×2) • Need large, multi-stage routers • ~$10M for 10Gb ports
Butterfly Network • Need “concentrator” switch layer • Each network level carries entire traffic load • Only one path between any two server and storage
Mesh • Routers to servers atmesh edges • 16384 1Gb links • Routers only atedges; mesh providespath redundancy
Torus • Mesh with edgeswrapped around • Reduces average pathlength • No edges; dedicatedconnection breakoutto servers
Hypercube • Special-case torus • Bandwidth scalesbetter than mesh/torus • Connections per nodeincreases with system • Can group devices intosmaller units and connectwith torus
Bandwidth • Not all topologies actually capable of 100GB/s • Maximum simultaneous bandwidth Link speed × number of links Average hops
Analysis • Embedding switches in storage fabric uses fewer high-speed ports, but more low-speed ports
Router Placement in Cube-Styles • Routers require nearly 100% bandwidth of links • Adjacent routers cause overload & underload • Use random placement; optimization possible?
Conclusions • Build multiprocessor-style network for storage • Commodity-based storage fabrics can be used to improve reliability and performance; scalable • Rely on large number of lower-speed links; limited number of high-speed links where necessary • Higher-dimension torii (4-D, 5-D) provides reasonable solution for 100GB/s from 1PB