Advancements in Distributed Computing at Altera: A Comprehensive Overview
120 likes | 240 Vues
Discover how Altera, a pioneer in programmable solutions founded in 1983, has evolved its distributed computing capabilities. With over $1 billion in sales and 2,300 employees, Altera focuses on providing robust programmable logic devices and IP development tools. Learn about the innovative technologies developed at their Toronto Technology Center, including a centralized scheduling system with multiple queues for efficient resource allocation. Explore features like priority execution, redundancy, and fault tolerance, designed to enhance capacity planning and matchmaking for worldwide customers.
Advancements in Distributed Computing at Altera: A Comprehensive Overview
E N D
Presentation Transcript
Batch Computing at Altera Condor, Quill and The Enterprise
About Altera • “The Programmable Solutions Company” • Pioneer of SOPC technology • Founded in 1983 • $1.02 billion in 2004 sales • 2,300 employees • 14,000+ worldwide customers
About Programmable Solutions Programmable Logic Devices (PLDs) Intellectual Property(IP) Development Software
About Me • Senior Software Engineer at the Toronto Technology Center • B.A.Sc. in Engineering Science from the University of Toronto • Joined Altera in 2001 • Focus on distributed computing
Where It All Began • Developed in Toronto • Centralized scheduling system • Multiple queues • Priority/FIFO execution • No limit on resource claims • Engineer-designed, custom API
Change Is Good, Right? • Multi-OS support • Redundancy and fault tolerance • Easy expansion beyond Toronto • Easy-to-use API • New features • Improve matchmaking • Capacity planning Really Important!
META SCHEDULER SOAP PriorityEngine CONDOR TTC DB Pain Free Migration CONDOR POOL USERTOOLS TTCPOOL
Time Stands Still • Nice-style priorities [1:N] • Use priority factor to ensure PN negotiates before PN+1, PN+2, etc. • RUP(PN) = 0.5 • EUP(PN)/EUP(PN+1) = ½ • Freeze RUP values in time • PRIORITY_HALFLIFE = 100000000000000000000 • Let jobs at PN get all VMs in the system • NEGOTIATOR_IGNORE_USER_PRIORITIES = True
Translation Services <cluster> <id>1</id> <priority>2</priority> <os>windows</os> <group>fitter</group> <job> <id>1</id> ... </job> <job> <id>2</id> ... </job> ... </cluster> +AlteraClusterID = 1 +AlteraGroup = fitter requirements = OpSys = ... +AccountingGroup = P1 AlteraTargetOs = windows ... +AlteraJobID =1 ... queue +AlteraJobID = 2 ... METASCHEDULER
SQL! SQL! Everywhere! METASCHEDULER USAGE HISTORY POSTGRESQL DBMS CONDOR QUILL STATUSINFO CONDORCOLLECTOR SYSTEM AUDITS
From Here, Where? • Roll out across the enterprise • Scaling with multiple schedds • Quill++ • DBMS for configuration management (with R. Nordlund & J. Stowe from The Hartford)