170 likes | 184 Vues
Explore distributed resource utilization and management in a "Vineyard" of Clusters, featuring applications, services management, and monitoring. Discover the Rootstock cluster distribution facility and the REXEC system for robust resource allocation. Dive into the CAN communication over VIA and IO Rivers. Follow the directions and discussions on the UC Berkeley Millennium Cluster system.
E N D
UCB Millenniumand theVineyard Cluster Architecture Phil Buonadonna University of California, Berkeley http://www.millennium.berkeley.edu
½ TB DLIB Millennium Project • Hierarchical “Cluster of Clusters” PIII-X 64x4 Ninja PIII 32x2 PII PIII Gigabit Ethernet (GbE) PII8x2 PII8x2 Astro Math PII8x2 PII8x2 PII8x2 Physics Bio CE UC Berkeley Millennium
Millennium Agenda • Investigate recent PC technologies in Clusters • NT/Linux • VI Architecture / GbE / Distributed I/O • Harvest the lessons learned from NOW • Robust, flexible remote execution • Distributed resource management • Investigate clusters that span administrative units • Turn-key cluster deployment • Sense of ownership • Investigate the “Computational Economy” Approach • Resource management with a natural sense of ownership • Enough heterogeneous interests to be worthwhile • Form basis for Sci. Computing, Internet Services, etc. UC Berkeley Millennium
Vineyard Cluster Architecture • Distributed resource utilization and management in a “Vineyard” of Clusters. Applications / Services Mgmt / Monitoring PBS I/O MPI VEXEC TOOLS REXEC - VIA / GM, GbE - Multicast - NT / Linux (2.2.x) - Stride Scheduler Rootstock Distribution UC Berkeley Millennium
Outline • Millennium Project • Vineyard Cluster SW Architecture • Important Component Technologies • Rootstock cluster SW distribution facility • REXEC: Robust Linux Remote Execution • Economic-based Resource allocation • CAN communication over VIA • IO Rivers • Directions and Discussion UC Berkeley Millennium
Rootstock • Disseminate easy-to-build PC cluster system software • Variety of cluster designs • well-engineered high-performance clusters • low-cost casual workgroup clusters • server farms • scalable internet servers • Root Cluster Server (CS) • Provides cluster software stock • Second-level customized distribution within each cluster from its own CS node UC Berkeley Millennium
Rootstock Cluster • Collection of nodes with IP connectivity • can be dedicated subnet, w/ or w/o NAT, or any collection • run nfsd (within cluster), httpd, ssl • One node designated as Cluster Root • serves as the root of administrative operations and mgmt. • may be same or different from other nodes • may participate in normal cluster operation or not => is trusted by other nodes and has storage for dialtone • May have designated front-end nodes or not • May have dedicated cluster-area-network (eg. Myrinet) or not. UC Berkeley Millennium
2. Make the CS “graft” - specify IP address - pckg removes - dchp, dns, nis,... sanity check and build - resolv.conf, /etc/hosts, ... constructs cluster build (lease) download CS build floppy 3. CS power-on build - xfer and localize DT - add local admin scripts - node build floppy Cluster leased builds K 4. Node power-on build - local stock from CS Rootstock Mechanics Cluster System Distribution Center cluster stock - build - os - drvrs - mill SW - os mods cs 1. Cluster Stock - Rootstock build pages - Full Current Linux - all fixes and pckgs - SSL, SSH - Cluster Drivers - Cluster System Layers - rexec, mpe, pbs - Optional SW ($) - Cluster Kernal Mods IP network CAN ... 5. Cluster Update button (future) - 2nd dialtone, CF engine, rolling update UC Berkeley Millennium
Computational Economy • Market-based approach to resource allocation • Optimizes for user value TimeShare API API BatchQueue Economic F.E. Access Modules Resources Apps(Value) Resource Managers UC Berkeley Millennium
REXEC Remote Execution • Secure, decentralized remote execution environment • Features • Decouples resource discovery and selection • Multiple Allocation Policies (VEXECs) • Decentralized control • Each client rexec is the root for a distributed task. • Dynamic discovery and configuration • Resource announcements on a cluster multi-cast channel • All Soft State • Simple, well-defined failure and cleanup models • “They all fall down” • Secure • Translates Pricing Mechanism to Resource Allocation UC Berkeley Millennium
REXEC / VEXEC • Components • rexecd, rexec & vexecd Node A Node B Node C Node D rexecd rexecd rexecd rexecd Cluster IP Multicast Channel vexecd(Policy A) vexecd(Policy B) “Node A” run indexer on Nodes AB at 3 credits/min minimum $ rexec %rexec –n 2 –r 3 indexer UC Berkeley Millennium
Interactive Pricing Mechanism • Most work on “economic mechanisms” focuses on single item or batch case • hold auctions (e.g., second-price sealed bid) • integrated into Vineyard PBS • interactive case needs to be very simple • Bidder i gets bi / åkbk of CPU at rate bi • enforced by stride scheduler • Running cluster mirror usage experiment • two identical clusters for one user community with $ accounts • one free and uncontrolled • one for bid and controlled • which is more desirable to use UC Berkeley Millennium
Communication / VIA • Multiple Physical Layers • Fast Ethernet • Gigabit Ethernet (Inter & Intra cluster net) • Myrinet w/ Lanai7 (Intra cluster net) • Transports • IP, IP Multicast • VI Architecture / GM • Explore integrated IPC and distributed I/O UC Berkeley Millennium
AM Architecture Proc A • Components • Endpoints • Virtual Networks • Bundles • Operations • Request / Reply • Short, Med, Long • Create, Map, Free • Poll, Wait • Credit based flow control Proc B Proc C UC Berkeley Millennium
AM-VIA Architecture • VI Queue (VIQ) • Logical channel for AM message type • VI & independent Send/Receive Queues • Independent request credit scheme (counter n) • MAP Object • Container for 3 VIQ’s • Short,Medium,Long • Single Registered Memory Region MAP Object UC Berkeley Millennium
AM-VIA Integration • Endpoints: Collection of MAP objects • Virtual network emulated by point-to-point connections • Bundle: Pair of VI Completion Queues • Send/Receive Proc A Proc B Proc C UC Berkeley Millennium