Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration

Scheduling in Heterogeneous Grid Environments:The Effects of Data Migration Leonid Oliker, Hongzhang Shan Future Technology Group Lawrence Berkeley Research Laboratory Warren Smith, Rupak Biswas NASA Advanced Supercomputing Division NASA Ames Research Center

Motivation • Geographically distributed resources • Difficult to schedule and manage efficiently • Autonomy (local scheduler) • Heterogeneity • Lack of perfect global information • Conflicting requirements between users and system administrators

Current Status • Grid Initiatives • Global Grid Forum, NASA Information Power Grid, TeraGrid, Particle Physics Data Grid, E-Grid, LHC Challenge • Grid Scheduling Services • Enabling multi-site application • Multi-Disciplinary Applications, Remote Visualization, Co-Scheduling,Distributed Data Mining, Parameter Studies • Job Migration • Improve Time-to-Solution • Avoid dependency on single resource provider • Optimize application mapping to target architecture • But what are the tradeoffs of data migration?

Our Contributions • Interaction between grid scheduler and local scheduler • Architecture: distributed, centralized, and ideal • Real workloads • Performance metrics • Job migration overhead • Superscheduler scalability • Fault tolerance • Multi-resource requirements

Grid Queue Local Queue Distributed Architecture Communication Infrastructure Info Job Job Middleware Grid Scheduler Grid Env Local Env Local Scheduler Compute Server PE PE … PE

Grid Queue Job Middleware JR Grid Scheduler AWT & CRU Local Scheduler Local Queue Sender-Initiated (S-I) Receiver-Initiated (R-I) Symmetrically-Initiated (Sy-I) Else : Considered for Migration Interaction between Grid and Local Schedulers • AWT: Approximate Wait Time • CRU: Current Resource Utilization • JR: Job Requirements If AWT <  :

Sender-Initiated (S-I) Partner 1 Host Partner 2 Jobi Jobi Requirements Jobi Requirements ART0 & CRU0 ART1 & CRU1 ART2 & CRU2 Jobi Resultsi Select the machine with the smallest Approximate Response Time (ART), Break tie by CRU ART = Approx Wait Time + Estimated Run Time

Receiver-Initiated (R-I) Partner 1 Host Partner 2 Jobi Free Signal Free Signal Jobi Requirements Jobi Requirements ART0 & CRU0 ART1 & CRU1 ART2 & CRU2 Jobi Querying begins after receiving free signal

No Volunteer After Time Period  Have Volunteers R-I S-I Symmetrically-Initiated (Sy-I) • First, work in R-I mode • Change to S-I mode if no machines volunteer • Switch back to R-I after job is scheduled

Web Portals Or Super Shell Jobs Grid Queue Centralized Architecture Middleware Grid Scheduler Advantages: Global View Disadvantages: Single point of failure, Scalability

Performance Metrics

Resource Configuration and Site Assignment • Each local site network has peak bandwidth of 800Mb/s (gigabit Ethernet LAN) • External network has 40Mb/s available point-to-point (high-performance WAN) • Assume all data transfers share network equally (network contention is modeled) • Assume performance linearly related to CPU speed • Assume users pre-compiled code for each of the heterogeneous platforms

Job Workloads • Systems located at Lawrence Berkeley Laboratory, NASA Ames Research Center,Lawrence Livermore Laboratory, San Diego Supercomputing Center • Data volume info not available. Assume volume is correlated to volume of work • B is number if Kbytes of each work unit (CPU * runtime) • Our best estimate is B=1Kb for each CPU second of application execution

Scheduling Policy 12 Sites Workload B • Large potential gain using grid superscheduler • Reduced average wait time by 25X compared with local scheme! • Sender-Initiated performance comparable to Centralized • Inverse between migration (FOJM,FDVM) and timing (NAWT, NART) • Very small fraction of response time spent moving data (DMOH)

Data Migration Sensitivity Sender-I 12 Sites • NAWT for 100B almost 8X than B, NART 50% higher • DMOH increases to 28% and 44% for 10B and 100B respectively • As B increases, data migration (FDVM) decreases due to increasing overhead • FOJM inconsistent because it measures # of jobs NOT data volume

Site Number Sensitivity Sender-I • 0.1B causes no site sensitivity, • 10B has noticeable effect as sites decrease from 12 to 3: • Decrease in time (NAWT, NART) due to increase in network bandwidth • Increase in fraction of data volume migrated (FDVM) • 40% Increase in fraction of response time moving data (DMOH)

Communication ObliviousScheduling Sender-I • For B10 If data migration cost is not considered in scheduling algorithm: • NART increases 14X, 40X for 12Sites, 3Sites respectively • NAWT increases 28X,43X for 12Sites, 3Sites respectively • DMOH is over 96%! (only 3% for B set) • 16% of all jobs blocked from executing waiting for data • Compared with practically 0% for communication-aware scheduling

Increased WorkloadSensitivity Sender-I12 Sites Workload B • Grid scheduling 40% more jobs, compared with non-grid local scheme: • No increase in time NAWT NART • Weighted Utilization increased from 66% to 93% • However there is fine line, when # of jobs increase by 45% • NAWT grows 3.5X, NART grows 2.4X!

Conclusions • Studied impact of data migration, simulating: • Compute servers • Grouping of serves into sites • Inter-server networks • Results showed huge benefits of grid scheduling • S-I reduced average turnaround time by 60% compared with local approach, even in the presence of input/output data migration • Algorithm can execute 40% more jobs in grid environment and deliver same turnaround times as non-grid scenario • For large data files, critical to consider migration overhead • 43X increase in NART using communication-oblivious scheduling

Future Work • Superscheduling scalability: • Resource discovery • Fault tolerance • Multi-resource requirements • Architectural heterogeneity • Practical deployment issues

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration

Scheduling in Heterogeneous Grid Environments: The Effects of Data Migration

Presentation Transcript

EACC to Unicode Migration

Simulation for Grid Computing

CROWN Grid Tutorial

Grid Computing

Human Migration

OPERATIONS SCHEDULING AND SEQUENCING

Chapter 22: Distributed Databases

iRODS Tutorial II. Data Grid Administration

Data Bases in Cloud Environments

Chapter 19: Distributed Databases

Operations Scheduling

Grid Computing and LA Grid

Lecture 3

Largest sized hydro unit (180 MW at Chamera) in the country

Distributed Databases

What is Migration Health?

PROGETTI E ARCHITETTURE GRID Progetti Grid INFN-NA e UniNA

Chapter 6: CPU Scheduling

Grid Computing (3) (Special Topics in Computer Engineering)

Chapter 5: CPU Scheduling

POWER GRID CORPORATION OF INDIA LTD.