1 / 49

Scheduling From the Perspective of the Application

Scheduling From the Perspective of the Application. By Francine Berman & Richard Wolski Presenter:Kun-chan Lan . Outline of the talk. Overview Case study Application-centric scheduling AppleS Project Result Conclusion. Overview. Why scheduling is important in metacomputing system

chin
Télécharger la présentation

Scheduling From the Perspective of the Application

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scheduling From the Perspective of the Application By Francine Berman & Richard Wolski Presenter:Kun-chan Lan

  2. Outline of the talk • Overview • Case study • Application-centric scheduling • AppleS Project • Result • Conclusion

  3. Overview.. • Why scheduling is important in metacomputing system • Better utilization of resource • Performance efficiency • Application-centric scheduling • Everything is evaluated in terms of its impact on the application

  4. ..Overview.. • Metacomputing • Aggregation of distributed and high-performance resources on coordinated networks, for performance required to address modern scientific problems • Heterogeneity(administrative domain, software/hardware architecture, protocol etc) • contention

  5. Performance oriented Aggregation of resources from a single site(a mutli-processor machine) Communicate via dedicated devices like switch,share-memory etc. Homogeneous(hardware/software infrastructure, administrative domain etc) Performance oriented aggregation of resource from multiple sites Communicate via a distributed network Heterogeneous resources A software infrastructure required to coordinate distributed networks into a communication substrate Parallel computing vs. Metacomputing

  6. Scheduling for parallel computing • Multiprocessor nodes generally have uniform capabilities • Usually there is a centralized system scheduler • Processors are dedicated to tasks of a single application -- No contention

  7. Scheduling for Metacomputing • Resources are often managed by separate schedulers which are not coordinated – no single system scheduler • Data conversion between sides • Overlapping of communication and computation to amortize network communication • Separate optimized algorithm for tasks on different machine

  8. Outline of the talk • Overview • Case study • Application-centric scheduling • AppleS • Result • Conclusion

  9. Case 1: CLEO/NILE

  10. CLEO • A high energy physics project • Each collision detected by CLEO is called an event • Each event is recorded and passed to a program called “pass2” to computer offline the physical properties of the particles • Records computed by “pass2” are read and compressed by another program for certain frequently-accessed fields • One terabyte of data being generated per year

  11. Nile.. • A by-product of CLEO • Each CLEO’s collaborating institution is a site • Goal • provide a scalable, fault-tolerant, heterogeneous system of hundreds of commodity workstations, with access to a distributed database in excess of 100~TB • Resources(CLEO data) are spread across the United States and Canada at 24 collaborating institutions • resource can be accessed and used transparently from anywhere by any member of the CLEO collaboration

  12. ..Nile.. • Not specific to CLEO, can be used by any application that is easily parallelizable • Currently implemented in CORBA/JAVA • Three components • Nile Control System(NCS) • Data Repository • User Interface • Interconnecting networks include ATM,FDDI and Ethernet

  13. Nile..

  14. ..Nile.. • NCS: • Site manager: • Interface between NCS and clients • Receive job requests • For each job request, create a job manager, store the job context into Job Database and place the job into queue • stateless

  15. ..Nile.. • NCS: • Job DB: • Store the state of job • Resource DB: • Maintain the state of available hardware resources at local site • Data Location Manager: • Translate logical data specification in the job profile to a set of corresponding physical data objects, which can be used to determine the suitable hosts to run the sub jobs

  16. ..Nile.. • NCS: • Job Manager • Divide a single job into a set of sub jobs which can be executed in parallel • Monitor the state of sub-jobs • Collect and assemble the results, and pass them back the site manager • Planner • Produce an execution plan consisting of a list of sub-jobs,each having a host machine and a set of data objects

  17. Characteristics of CLEO/NILE • The quantity of data for the problem is so large that no single site can provide all the resources needed • Efficient resource allocation is crucial • Execution sites and network interconnection are heterogeneous • Some resources are shared by other application, so performance might vary greatly based on contention for resources

  18. CASE 2: 3-D REACT • Try to predict the energy level of reaction using quantum mechanics • Simulate a hydrogen-deuterium reaction • Essentially calculating the solution to a six-dimensional Schrodinger equation, and can be decomposed into three tasks • LHSF(local hyper-spherical surface function) • Log-D(logarithmic derivative propagation): use the result of LHSF as input • ASY:an asymptotic analysis on the matrices generated during the Log-D calculation

  19. Scheduling 3D-REACT • Distribute 3D-REACT two computation units • Cray C90 in SDSC • 64-node Intel Paragon in CalTech • The problem is divided into smaller sub-domains of 5-20 surface function per sub-domains, so LHSF and Log-D can be executed concurrently • First C90 calculate the LHSF for a given sub-domain, and then the result is passed to Paragon which will calculate the log-D portion of that sub-domain • While Paragon is calculating the first sub-domain, C90 can start calculating the second sub-domain • After all the sub-domains are considered, the ASY will determine whether the calculation should stop

  20. Characteristics of 3D-REACT • The algorithm implemented by a task is optimized for the machine to which it has assigned • Eg. The Log-D implementation used in C90 is different than that used in Paragon • Computation and communication can be pipelined to amortize communication delays • Data might need to be converted into different format when being transferred between different sites • Eg. The floating point needed to be converted when C90 sends data to Paragon • Scheduling is critical for performance • Each of the sub-tasks (LHSF/Log-D/ASY) can be execute on either machine

  21. Outline of the talk • Overview • Case study • Application-centric scheduling • AppleS • Result • Conclusion

  22. Generalization of Application-Centric scheduling • Each application develop a schedule to optimize its own performance without regard to the performance goals of other applications which share the system • Each application-centric schedule for different application is unrelated • However, there are still some commonalities which underly application-centric program development

  23. Components of Application-Centric scheduling.. • Performance criteria/metrics • Dynamic system state • Application-specific resource locality • Application performance characteristics • User preferences • Prediction

  24. Performance criteria/metrics • Performance criteria/metrics vary with the application • Eg. to minimize execution time • 3D-REACT: by maximizing speedup over a single-machine implementation • NILE: by distributing analysis of independent events • Some common metrics • Execution time • Speedup • Cost of execution cycle • User will attempt to optimize the usages of same resource for different performance criteria at the same time

  25. Dynamic system state • Mixture of dedicated and non-dedicated resources • Should wait until the dedicated resources become available, or • Should execute the application with lesser performance on the non-dedicated resource currently available • Requirement of dynamic assessment of • Current system state • Resource loads • Short-term, but accurate prediction

  26. taskX taskY X Y Application-specific Resource Locality • Applications seek to use “close” resources? • “Closeness” is a function of what the application requires from a resource as well as the resource’s capability • “Distance” of resources: the resource performance deliverable to application • Is X and Y close?

  27. Application Characteristics • Implementation-dependent and implementation-independent • Some common categories of attributes • Task-specific implementation characteristics • Computation paradigm,number/size of data structure, data communication pattern, memory requirement, etc. • Inter-task communication characteristics • Data format for each task,pipeline size,communication regularity and frequency, etc. • Application structure information • Input/output requirement,iteration pattern, etc.

  28. User Preferences • Not necessary directly related to application performance • Act as a filter over the possible resources and implementation available to the user

  29. Role of Prediction • Prediction tells you • Potential communication and computation behavior of the application • Potential availability and load of resource • Potential performance of the application with respect to candidate schedules • Sources of prediction • App-specific or app-independent benchmark • Statistical analysis • Sensed or sampled data • Analytical model

  30. Process of scheduling an application • Use user preference to filter out infeasible schedules • Use application-specific and dynamic information to develop an schedule • Use individual notion of performance and resource locality to evaluate the schedule • Predict the performance of candidate schedules • Compare and determine the “best schedule” that can be implemented on the available resources

  31. Outline of the talk • Overview • Case study • Application-centric scheduling • AppleS • Result • Conclusion

  32. AppleS(Application-level Scheduler) • Each application will have its own AppleS agent(a customized scheduler for each application) • What does AppleS do? • Select resources • Determine a performance-efficient schedule • Implement that schedule with respect to the appropriate resource management system • AppleS is NOT a resource management system: it rely on systems such as Globus,Legion

  33. Organization of an AppleS agent

  34. components of AppleS • Resource Selector: • choose and filter different resource combination • Planner • Generate a description of a resource-dependent schedule from a given resource combination • Performance estimator • Generate an estimate for candidate schedules according to the user’s performance metric • Coordinator • Choose the “best” schedule • Actuator • Implement the “best” schedule on the target resource management system

  35. Input of AppleS: Information Pool • Network Weather Service • Dynamic information of system state and forecast of resource load • Heterogeneous Application Template(HAT) • information for the structure, characteristics and implementation of application and its tasks • Model • Used for performance estimation, planning and resource selection • User Specification(US) • Information on user’s criteria for performance, execution constraint, preference for implementation, etc

  36. Using AppleS • User provide information to AppleS via HAT and US • Coordinator uses this information to filter out infeasible/possibly-bad schedules • Resource selector identify promising sets of resource, and prioritize them based the logical “distance” between resources • Planner computes a potential schedule for each viable resource configuration • Performance estimator evaluates each schedule in terms of the user’s performance objective • Coordinator chooses the best schedule and then implements it with Actuator

  37. Using AppleSExample: 3D-REACT • Assuming implementations of LHSF and Log-D are available for several architectures • HAT: specify the computation-to-communication ratios for LHSF and Log-D, degree of overlap that is possible between the two, etc. for each implementation • Resource selector determine viable pairs of resources • Planner identify a set of candidate schedules • Performance estimator calculate the transfer unit size between LHSF and Log-D for each candidate schedule • Coordinator sends the best schedule to the Actuator

  38. Outline of the talk • Overview • Case study • Application-centric scheduling • AppleS • Result • Conclusion

  39. Jacobi2D code.. • a distributed data-parallel two dimensional Jacobi iterative solver • commonly used to solve the finite-difference approximation to Poisson's equation • Variable coefficients are represented as elements of a two-dimensional grid • At each iteration, the new value of each grid element is defined to be the average of its four nearest neighbors during the previous iteration

  40. ..Jacobi2D code • Typically, the Jacobi computation is parallelized by partitioning the grid into rectangular regions, and then assigning each region to a different processor • Parallelism vs. communication overhead P0 is twice as fast as processor P1 or P2

  41. RS600 FDDI Alpha workstation

  42. Three partition methods • HPF Uniform/Blocked • each processor is assigned (at compile-time) a relatively equal-sized square region of the grid to compute • Non-Uniform Strip • uses good static estimates for resource performance and uses resource selection to select a resource set from the total resources • AppleS

  43. Memory availability • Adding two IBM SP-2 node with 128M memory into resource pool • dedicated access to the two SP-2 nodes and the link between them • the best partitioning is to split the grid evenly between the two SP-2 nodes as long as neither partition exceeded the available real memory on each node

  44. A lot of page swapping

  45. Conclusion • Performance-efficient schedule must exploit the concurrency of independent application task as well as factor in the impact of resource contention/diversity/autonomy • AppleS: http://apples.ucsd.edu/, still a working-in-progress • Related work: MARS: http://www.uni-paderborn.de/pc2/projects/mol/mars.htm • CLEO: http://www.lns.cornell.edu/public/CLEO/ • 3D-REACT: http://www.cacr.caltech.edu/Publications/techpubs/CASA/cacr123/web4.htm

More Related