200 likes | 387 Vues
The Prospero Resource Manager: A Scalable Framework for Processor Allocation in Distributed Systems. Abdul Aziz Habib Ammari Pearl Thomas Vamsi Krishna. Introduction. Poor performance of conventional techniques (Parallel Vs Distributed) Prospero Resouce Manager (PRM)
E N D
The Prospero Resource Manager: A Scalable Framework for Processor Allocation in Distributed Systems Abdul Aziz Habib Ammari Pearl Thomas Vamsi Krishna
Introduction • Poor performance of conventional techniques (Parallel Vs Distributed) • Prospero Resouce Manager (PRM) • Resource management techniques should scale: • numerically • geographically • admisintratively
Introduction- cont’d • Prospero Perspective: Multiple Resource Managers • System Manager • Job Manager and • Node Manager
Program Execution 5 Program Loading Common Libraries 4 Task to Processor Mapping 3 Processor Selection/Allocation 2 Configuration of Environment 1 Contemporary Approaches • Phases of execution • Distributed Environment: List of available nodes • Locus, NEST, Sprite, and V support processors allocation and remote program loading
Contemporary Approaches - cont’d • Locus: environment of initiating process • NEST: advertise availability • Sprite: shared file as a centralize database • V: server selects least loaded node • UCLA Benevolent Bandit Laboratory (BBL) • DQS and Lsbatch • Parallel Virtual Machine (PVM) and Net-Express
Scalable Resource Management • Virtual System Model: new model for organizing large distributed systems • Access of a subset of resources • Hiding the mapping of resources to physical locations • Partition of the resource management functions • System manager • Job manager • Node manager
Scalable Resource Management (con’t) • System managers • Managing subsets of resources (processors) • Hierarchical concept (layers of system managers) • Maintaining all information about resources • Reacting to status updates (node managers) and resources requests (job managers) • Assigning suitable resources upon requests, notifying job manager, node managers responsible for each resource (only a subset of the requested resources can be assigned)
Scalable Resource Management (con’t) • Job manager • Agent for tasks in a job • One job manager per job • Part of a job and aware of requirement and communication patterns of the managed tasks • Support fault-tolerant and real-time applications • debugging and performance tuning
Scalable Resource Management (con’t) • Identification of job’s resource requirements (job initiated) • Locating system managers and sending allocation requests • Monitoring the execution of the program
Scalable Resource Management (con’t) Node manager • Receiving messages from the system manager (identifying job managers to load, execute programs) • Notifying the job manager about events (termination and failure of tasks) • Informing the system manager about availability of the node for assignment • Caching information needed to direct messages for other tasks to the node on which the task runs
Implementation : Introduction • Prospero Resource Manager (PRM) Implementation • - Runs on a collection of work stations (Sun-3, HP 9000/700 etc.) • - Workstations connected by LAN/WAN • - Supports heterogeneous execution environment • - The system manager can manage nodes of more than one processor type • - Enables the user to place constraints (type, location etc) through job configuration options. • - Also supports parallel and remote sequential applications
Program Loading and I/O • PRM supports explicit loading of files when the nodes assigned to jobs don’t share common file system • - Performed by transferring the executables to the node’s local file system • - File I/O task handles access to files on the user’s local system • - A task has exclusive read/write access to a shared file • Terminal I/O task supports interactive execution • - Users can customize the task for job initialization functions such as interactive inputs and assigning inputs to appropriate task
Communication Libraries • Communication Library Functions • - Provides routines for sending, receiving and broadcasting tag messages • - Commonly used routines made available through set of macros & functions • - Provides routines for message passing, buffer manipulation, process control • data packing and unpacking • Approach • - ARDP protocol is used to transmit and receive sequence packets
Job Manager Supporting program development • Supports debugging of parallel applications • - Check point and replay approaches used • - Programs can be restored to their past states • - Tasks maintains a log of communications activities • - Task monitor exist for each task • - Individual task can be replayed in isolations
Performance • Communication Latencies • PVM library over ARDP Vs PVM ver 3.2.6 • Resource Allocation performance of PRM • Test Bed • SPARC-10s connected to ethernet • Exclusive machines • SunOS 4.1.3 with improved time facility • pvm_send() & pvm_recv()
Wide Area Network Simulation • Latency of 0msec, 10msec, and 100msec • USC, USC-ISI, ISI-MIT Table 1 : Average Time (in msecs) to execute a pvm_send() – pvm_recv() pair Table 2: Average time (in msecs) to execute a pvm_mcast() and matching pvm_recv() pair
Resource Allocation Results Table 3 : Allocation time as a function of the number of nodes allocated Table 4: Allocation time as a function of the number of system managers from which resources are requested. A total of 8 nodes were allocated in each case.
Future Directions • Alternative job managers • fault-tolerant and real time applications • Node manager • part of kernel • compiler generated resource list • preemptive scheduling of tasks • Integrated set of tools for developing and executing parallel and distributed applications • Security
Conclusion • Prospero : A different approach to resource management. • Scalable • provides framework for development and execution of parallel and distributed applications