By Nithiapidary Muthuvelu FIT Supervisor: Dr Ian Chai Co-supervisor: Dr David Chieng Heng Tze

Grid Computing Research Seminar DYNAMIC JOB GROUPING-BASED SCHEDULING FOR GRID APPLICATION By Nithiapidary Muthuvelu FIT Supervisor: Dr Ian Chai Co-supervisor: Dr David Chieng Heng Tze

Grid Computing Contents • Grid Computing • Resource Broker • Motivation • Objectives • Literature Review (Summary) • Problem Formulation • Research Methodology • Gantt Chart • Conclusions • References

Grid Computing Grid Computing • Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates the executions of large-scale resource intensive applications on geographically distributed computing resources. • Heterogeneous computing resources across political and administrative domains act as a single powerful computer to speed up the execution of computation intensive applications.

Grid Computing Grid Information Service Grid Information Service system collects the details of the available Grid resources and passes the information to the resource broker. Details of Grid resources Computational jobs Grid application Processed jobs Computation result User Resource Broker A User sends computation or data intensive application to Global Grids in order to speed up the execution of the application. AResourceBroker distribute the jobs in an application to the Grid resources based on user’s QoS requirements and details of available Grid resources for further executions. Grid Resources Grid Resources (Cluster, PC, Supercomputer, database, instruments, etc.) in the Global Grid execute the user jobs. Figure 1: A typical view of Grid environment Grid Computing (Cont.)

Grid Computing APPLICATIONS Applications and Portals … Prob. Solving Env. Collaboration Engineering Web enabled Apps Scientific USER LEVEL MIDDLEWARE Development Environments and Tools Languages/Compilers Libraries Debuggers Monitors … Web tools Resource Management, Selection, and Aggregation (BROKERS) CORE MIDDLEWARE Distributed Resources Coupling Services Security Information Data Process Trading … QoS SECURITY LAYER Local Resource Managers FABRIC … Internet Protocols Libraries & App Kernels Queuing Systems Operating Systems Networked Resources across Organizations … Computers Networks Storage Systems Data Sources Scientific Instruments Grid Computing – Grid Layered Architecture Figure 2: Grid layered architecture Adapted from (Foster and Kesselman, 1999)

Grid Computing Job1 Grid Application Job2 Job3 Job n Resource Broker Resource Broker • Job An atomic unit of computation. • Grid application A highly computation intensive task consists of several jobs. • Resource brokerorscheduler Selects the most suitable computing resources over the Grid and distribute the jobs among those available computing resources for further job executions. Goal: To ensure the jobs are processed with high system throughput. Figure 3: Grid Resource Broker

Grid Computing Resource Broker (Cont.) • Process flow(Berman, Fox and Hey 2002) : 1. Receives application jobs 2. Analyses user’s Quality of Service (QoS) requirements such as budget and deadline (Abramson, Buyya, Giddy 2002) 3. Discovers available and suitable computing resources in the Grid from Grid Information Service 4. Maps the jobs to resources (scheduling) 5. Stages the application and data for processing (deployment) 6. Starts job execution 7. Gathers the results A resource broker is also responsible for monitoring and tracking job execution progress along with adapting to the changes in Grid runtime environment conditions and resource failures.

Grid Computing Resource Broker (Cont.) • Nimrod-G broker (Abramson, Buyya and Giddy , 2002) An economic paradigm for resource management and scheduling parameter sweep applications on distributed resources. Economic and On Demand ‘Brain Activity Analysis and Molecular Modeling for Drug Design • Condor-G broker (Frey, Tannenbaum, et. al., 2001). A broker to develop, deploy and evaluate mechanisms and policies that support high-throughput computing on large collections of distributed resources. GridGaussianproject - NUG30 Optimization Problem, a quadratic assignment problem • Gridbus (Srikumar, Buyya and Winton, 2004) A broker that supports a declarative and dynamic parametric programming model for creating Grid applications. Belle Analysis Software Framework, a physics analysis application

Grid Computing Resource Brokers –Types

Grid Computing Resource Brokers –Gridbus Broker Figure 4: Gribus Broker

Grid Computing Motivation Grid job transmission method Drawbacks

Grid Computing Time t1 t2 t3 t4 t5 t6 t7 t8 t9 t10 Job 1 Job 4 Job 2 Grid application Job 5 Computation result Job 3 Job 6 Motivation –Job Transmission A user sends an application to the resource broker. A Grid application is composed of a large number of computational jobs. A resource broker sends one job to a Grid resource at a time. Once the job is executed by the Grid resource, it will be forwarded back to the resource broker. Figure 5: The current process of sending and receiving of user jobs to and from Grid resources

Grid Computing Motivation –Drawbacks • In the case of application which involves large volume of data, sending and receiving a small job at a time increases the overalloverhead time and cost in terms of job transmission to and from Grid resources and job processing at the Grid resources (Muthuvelu, Liu, et. Al., 2005). • Job transmission time = job initiation time at the source + job transmission time to a resource + job storing time at the resource + job (processes job) transmission time to the source. • Job processing or computation time = job initiation time at the resources + job execution time. • Job processing cost = job transmission cost + (total CPU time for job execution * cost per second associated with Grid resource)

Grid Computing Objectives • To understand the process flow of the existing Grid resource brokers and identify the limitations in terms of job management, job scheduling and job communication time and cost • To realize the need for a job grouping based scheduling system and identify the major and minor factors involved in job grouping method and relevant quantitative data and data sources • To establish a complete framework of the proposed scheduler system and to implement the real system • To measure the performance of the proposed scheduler system in terms of job transmission time and cost and Grid resourceutilizations through real world Grid applications • To compare the performance of the proposed scheduler system with the current scheduler system and quantify the resulting advantages and disadvantages

Grid Computing Literature Review • Resource Broker • Grid Application • A Grid application consists of large number of jobs. • Individual job processing method. • Related Work • (Buyya, Date, et. Al., 2005) - distributed analysis of brain activity data. Magnetoencephalography (MEG) helmet produces large amount of data for analyzing brain activities. The main issue is the expenses caused from overhead communication time and cost. • (James, Hawick and Coddington, 1999) - A study of scheduling heuristics for independent jobs. Nodes are made to synchronize after each round of execution - synchronization overhead.

Grid Computing Data Generation Results • [deadline, budget, optimization preference] Literature Review (Cont.) 64 sensors MEG Analysis All pairs (64x64) of MEG data by shifting the temporal region of MEG data over time: 0 to 29750: 64x64x29750 jobs 2 3 1 Data Analysis 5 Nimrod-G 4 Life-electronics laboratory, AIST World-Wide Grid • Provision of expertise in the analysis of brain function • Provision of MEG analysis Figure 6: Distributed analysis of brain activity data Adapted from (Buyya, Date, et. Al., 2005)

Grid Computing Problem Formulation • Dynamic job grouping-based scheduling system (an overview) • Simulation results • Job grouping method • Scope • Predicted benefits

Grid Computing Grid Resources Resource 1 Supercomputer Grid Job Scheduler Grid Information Service User Jobs Available Grid resources Processing requirement or load of each job Processing load of each Grid resources Overhead processing time and cost of each job Resource 2 Cluster Processing speed and cost of each Grid resources QoS requirements of the Grid application n Resources Grid resource 1 Grid resource 2 Job group 1 Job group 2 Problem Formulation -Job Grouping Overview • A dynamic job grouping-based scheduling technique is needed to group the atomic jobs dynamically according to user’s QoS requirements and available Grid resources. The resource broker then sends the grouped jobs to appropriate computing resources (Muthuvelu, Liu, et. Al., 2005). Figure 7: Dynamic job-grouping based scheduler

Grid Computing Problem Formulation -Simulation • GridSim Toolkit – Grid Simulation (Buyya and Murshed, 2002) A toolkit for modeling and simulation of distributed resource management and scheduling for conventional Grid environment. Figure 8: Processing time (a) and cost (b) for executing 150 Gridlets of 200 average MI within the granularity size of 30 seconds

Grid Computing Problem Formulation -Job Grouping User Job ID/MI Job Group ID/MI Grid Resource ID/MIPS Job 0/100 Job 1/99 Job Group 1/371 Job 2/102 Job 3/70 Job 4/100 Job 5/110 Job Group 2/290 Job 6/80 Resource 1/400 Job 7/50 Figure 9: Job grouping strategy Job Group 3/300 Resource 2/340 …… Job Group 4/350 Job 30/120 Resource 3/350 Job 31/99 Job Group 5/274 Job 32/105 Resource 4/350 Job 33/70 Resource 5/290 …… Job Group 6/400 Job 54/100 Job 55/110 Job Group 7/330 Job 56/70 Job 57/50 …… Job Group 8/300

Grid Computing User Job ID / MI Job Group ID / MI Resource ID/MIPS Granularity Size: 3 sec Job 0/20 Job 1/21 Job Group 0/85 Job 2/21 Job 3/23 Job 4/22 Job 5/19 Resource 11/33 Job 6/18 Job Group 1/103 Total_RMI: 99 Job 7/19 Resource 15/35 Job 8/25 Total_RMI: 105 Job 9/28 Job Group 2/200 ………. Resource 13/70 Job 50/29 Total_RMI: 210 Job 51/30 Job Group 3/88 Job 52/29 ………. Job Group 4/100 Job 96/21 Job 97/22 Job Group 5/97 Job 98/30 Job 99/24 Problem Formulation -Job Grouping Figure 10: Job grouping with granularity size

USER JOBS Job Scheduler Jobs Job MI Total number of jobs Average MI of job Granularity Size Total MIPS GRID RES. ID Grid Computing MI deviation percentage Resource MIPS Grid resource 0 Overhead processing time Grid resource 1 Grid resource 0 Grid resource 1 Grid resource 2 … Granularity size Grid resource N Job group 0 Job group 1 Job group 2 (1) (4) Grid resources’ characteristics (6) (5) (2) (3) (7)) (8) Job groups Resource IDs Problem Formulation -Job Grouping Figure 11: Simulation strategy for dynamic job grouping-based scheduler

Grid Computing Problem Formulation -Job Grouping Factors to be Considered in Job Grouping Strategy • User’s QoS requirements • Cost optimization • Time optimization • Cost-Time optimization • Total number of jobs in a Grid application • MI of each job • Number of available Grid resources • MIPS (speed) of each resource • Cost of each resource (cost per second) • Current processing load of each resource (predict job waiting time) • Network bandwidth • Local scheduler’s queuing capacity Method / Technique to be used for Grouping Grid Jobs • Genetic algorithm • Fitness function • User’s Qos requirements • Network bandwidth Fault Tolerance Situation • Resource failure – failed job regrouping • Application withdrawal – statistics Output • Grid application processing cost and time • Processed jobs (results) • Statistics

Grid Computing Problem Formulation -Scope • Computational jobs • Independent jobs • Web service (job grouping method can be called by various brokers)

Grid Computing Problem Formulation -Predicted Benefits • Speeds up the execution of a particular Grid application. This will encourage more Grid participants to deploy their computation intensive or data intensive applications on Global Grids. Great participations in Global Grids will increase the utilization of Grid technology. • Trims down the cost of changing hardware at the Grid resources. The decreasing hardware performance can be compensated with job grouping method since this method reduces total overhead time involved in the execution of Grid application. • Optimum utilization of Grid resources and network bandwidth. Jobs are grouped based on available processing capability of a particular Grid resource. In other words, maximum MI supported by the a Grid resource at a particular time is grouped for processing. Condition factor would be the network bandwidth.

Grid Computing Research Methodology • Review on Grid job management and scheduling systems. • Research method: Applied research, Scientific approach and Descriptive, comparative and response experiments. • Grid lab set-up on MYREN. • Input and output of the proposed system. • System framework. • Simulations of the proposed system. • Implementation of the real scheduler system and system integration. • Evaluation and analysis of the scheduler system. • Product deliverable to the Grid users. • Future work determination, recommendations, documentation.

Grid Computing Research Methodology (Cont.) • Project Requirements • System software: Globus Toolkit 4.0, Gridbus Broker, Nimrod-G Broker, Linux. • Database and analysis tools: phpdev423 (Apache server, mySQL) or Ms Excel • Network: MYREN • Hardware/equipment: Grid computing resource (MMUGrid), external Grid resources, a commodity machine (PUTTY)

Grid Computing Research Methodology –Data retrieval (1) – Details on Grid Resources • Grid monitoring tools (of various standards) • 1MDS4 • 2Ganglia • Type of data • Number of available Grid resources • MIPS (speed) of each resource • Cost of each resource (cost per second) • Current processing load of each resource (predict job waiting time) • Network bandwidth • Local scheduler type and relevant protocols • Local scheduler’s queuing capacity 1http://www.globus.org/toolkit/docs/4.0/info/key-index.html 2http://ganglia.sourceforge.net/

Grid Computing Resource 3 Resource 1 Resource 2 Resource 4 • Alchemi - LSF queue • Globus (GT2) Suprco. - Fork / WS GRAM • Globus (GT4) Cluster - SGE queue • Globus (GT4) Cluster - PBS queue MDS4 Ganglia MDS4 Ganglia 1. Queue Information - Queue type and version - Default GRAM version, port, host - Total CPUs - Status (up/down) - Total jobs in the queue - Running jobs - Waiting jobs - Free CPUs - MaxCPUTime - MaxWallClockTime - MaxTotalJobs - MaxRunningJobs 2. Cluster Information - Unique ID - Benchmark/clock speed - Storage capacity - Cost per second - Number of nodes - Status (up/down) Research Methodology –Data retrieval (1) – Details on Grid Resources Figure 12: Data retrieval from Grid resources

Grid Computing Research Methodology –Data retrieval (2) – Details on Grid Application • User applications (of various requirements) • Independent computational jobs • Type of data • User’s QoS requirements • Cost optimization • Time optimization • Cost-Time optimization • Total number of jobs in a Grid application • MI of each job

Grid Computing Research Methodology –Data retrieval (3) – Details on Network • Network bandwidth • Network information service • Type of data • Network bandwidth

Grid Computing Research Methodology –Data Processing • Data standardization • Data from various resources – in various standards • Standardize the queue and cluster information • Genetic Algorithm • Fitness function – network bandwidth & QoS requirements • Granularity size / time • Output data • Application processing time • Application processing cost

Grid Computing Research Methodology –Data Processing Fitness function: Network bandwidth, QoS requirements Genetic Algorithm User application details Grid resources’ details Network bandwidth Common granularity size Resource-dependent granularity size Figure 13: Data Processing

Grid Computing Fitness function: Network bandwidth, QoS requirements Genetic Algorithm User application details Grid resources’ details Network bandwidth Common granularity size Resource-dependent granularity size Job1 Job2 Job3 Job n Research Methodology –Web Services • Data collection/retrieval • Data processing Network Information Service Figure 14: Overall view of system implementation Grid Application Resource Broker

Grid Computing Gantt Chart

Grid Computing Gantt Chart (Cont.)

Grid Computing Conclusion • A dynamic job grouping-based scheduling system results in timely and less expensive implementation of data intensive Grid applications. • Job grouping method allows total processing capabilities of each available resource to be fully utilized during the job executions. • Simulations and experiments have been conducted to prove that developing a real Grid application that exploits job grouping method is really a worthy contribution for Global Grids. • This project involves research in the following areas: • Grid monitoring tools (MDS4, Ganglia) • Resource queuing system (SGE, PBS, WS GRAM, LSF, etc) • Grid job scheduler (Grid Broker – Gridbus, Nimrod-G, Condor-G, Gridway) • Network information service • Grid application jobs • User’s QoS requirements • Genetic algorithm

Grid Computing References • Abramson, D., Buyya, R. and Giddy, J. (2002): A Computational Economy for Grid Computing and its Implementation in the Nimrod-G Resource Broker. Journal of Future Generation Computer Systems (FGCS), 18(8): 1061-1074. • D. Abramson, R. Buyya, and J. Giddy (2002): A Computational Economy for Grid Computing, and its Implementation in the Nimrod-G Resource Broker,Future Generation Computer Systems (FGCS) Journal, 18(8): 1061-1074. • F. Berman, G. C. Fox, and A. J.G. Hey (2003): GridComputing – Making the Global Infrastructure a Reality, Wiley, England 2003. • Frey, J., Tannenbaum, T., Livny, M. and Foster, I., and Tuecke, S. (2001): Condor-G: A Computation Management Agent for Multi-Institutional Grids.Proceedings of the Tenth International Symposium on High Performance Distributed Computing (HPDC-10). IEEE Press. • H.A. James, K.A. Hawick and P.D. Coddington (1999): Scheduling Independent Tasks on Metacomputing Systems, Proc. Of Parallel and Distributed Computing (PDCS ’99), Fort Lauderdale, USA, ISBN:1880843293. • I. Foster and C. Kesselman (1999): The Grid: Blueprint for a New Computing Infrastructure,Morgan Kaufmann Publisher, Inc, San Francisco, California.

Grid Computing References • Muthuvelu, N., Liu, J., Lin Soe, N., Venugopal, S., Sulistio, A. and Buyya, R. (February 2005): A Dynamic Job Grouping-Based Scheduling for Deploying Applications with Fine-Grained Tasks on Global Grids. Proceedings of Australasian Workshop on Grid Computing and e-Research (AusGrid2005), Newcastle, Australia, pp. 41-48. • R. Buyya, and M. Murshed (2002): GridSim: A Toolkit for the Modeling, and Simulation of Distributed Resource Management, and Scheduling for Grid Computing,The Journal of Concurrency, and Computation: Practice, and Experience (CCPE), 14(13-15):1175-1220, Wiley Press, USA. • R. Buyya, S. Date, Y. Mizuno-Matsumoto, S. Venugopal, and D. Abramson (2004): Neuroscience Instrumentation and Distributed Analysis of Brain Activity Data: A Case for eScience on Global Grids, Journal of Concurrency and Computation: Practice and Experience, Wiley Press, USA (accepted in Jan. 2004 and in print). • Sherwani, J., Ali, N., Lotia, N., Hayat, Z. and Buyya, R. (2004): Libra: A Computational Economy-Based Job Scheduling System for Clusters. Journal of Software: Practice and Experience (SP&E), 34: 573-590. • Srikumar, V., Buyya, R. and Winton, L. (2004): A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids. Technical Report, GRIDS-TR-2004-1. Grid Computing and Distributed Systems Laboratory, The University of Melbourne, Australia.

By Nithiapidary Muthuvelu FIT Supervisor: Dr Ian Chai Co-supervisor: Dr David Chieng Heng Tze

By Nithiapidary Muthuvelu FIT Supervisor: Dr Ian Chai Co-supervisor: Dr David Chieng Heng Tze

Presentation Transcript

Supervisor: Assist . Prof. Dr. Murat Yılmaz Co -Advisor: Dr. Eray Tüzün

by Jason Perry Supervisor: Dr. Adel M. Sharaf

Presenter: R3 Supervisor: Dr

Jeremy Stempka: M.Sc. Candidate Dr. Scott Petrie: Supervisor Dr. Robert Bailey: Co Supervisor

Prepared by: Samia Ahmed Nadi P67778 Supervisor: Prof . Dr. Nowshad Amin Co- Supervisor: Prof . Dato

Supervisor: Dr. Hassan Sawalha

Supervisor :Dr TARIQ ALMOFLEHI prepared by: Dr A.AZiZ Aonallah

Presented by Supervisor Selvaraja, A. Dr. Y. Venkatesha

Ntakadzeni Edwin Madala Supervisor: Prof IA Dubery Co-supervisor: Dr LA Piater

Presented By: Sile Corbett Supervisor: Dr. Catriona Murphy

Dr. Michael McGuire - Supervisor

Supervisor: Dr. ElSayed Eissa Hemayed

Dr. Edward S. Marschilok Supervisor

By: Fattane Zarrinkalam Supervisor: Dr. Mohsen Kahani

Hanna Comaneshter PhD Candidate Supervisor: Dr. David Levi-Faur

Dr. Ashraf Armoush Supervisor

Presented by: Huda Haddad Supervisor: Dr. Abdalla Obeidat Co-Advisor : Dr. Borhan Al-biss

Barbara Nattabi Primary Supervisor: Dr Jaya Earnest Co Supervisor: Dr Sandy Thompson

Dirk van Schalkwyk Supervisor: Dr Greg Foster Co-Supervisor: Mrs Madeleine Wright

Sansak Nakavisut Principle supervisor: Dr Ron Crump Co-supervisor: Dr Hans Graser

Dr Bridie McCarthy Supervisor : Dr Tom Andrews Co-supervisor : Professor Josephine Hegarty

Serena Isaacs Supervisor: Prof Nicky Roman Co-supervisor: Dr Shazly Savahl