Parallel Computing

Parallel Computing • The Bad News • Hardware is not getting faster fast enough • Too many architectures • Existing architectures are too specific • Programs closely tied to architecture • Software is being developed using 50’s mentality

Computing Trends • Centralized systems are a thing of the past • Evolving towards cycle servers • Each user has their own computer • Workstations are networked • Typical LAN speeds are 100mbs • For some a single workstation does not provide adequate computing power

A Solution • A virtual computing environment • Utilize existing software to build a programming model that can be used to develop distributed and parallel applications • Provide tools to create, debug, and execute applications on heterogeneous hardware • Let the software map high level descriptions of the problems to available hardware • Programmer will no longer need to be concerned with low-level issues

Other Names • For many scientists, it is not uncommon to find problems that require weeks or months of computation to solve. • Such an environment is called a High Throughput Computing (HTC) environment • Scientists involved in this type of research need a computing environment that delivers large amounts of computational power over a long period of time • In contrast, High Performance Computing (HPC) environments deliver a tremendous amount of power over a short period of time.

Workstation Users • All VCE configuration include some workstations • Workstations are chronically underutilized • Workstation users can be classified as follows: • Casual Users • Sporadic Users • Frustrated Users • The VCE must help frustrated users without hurting casual and sporadic users

Other Considerations • The VCE must be cost effective • Use existing tools like NFS, ISIS, PVM, MPI whenever possible • Must not require tremendous amounts of processor power • The VCE must coexist with other software • Non-VCE applications should not be impacted by the VCE • The VCE must avoid kernel modes

Users View of the VCE • The software development module (SDM) provides tools to build and annotate an application task graph • The Execution module (EXM) compiles the application and dispatches the tasks

The VCE Problem Specification SDM Design Stage Coding Level Compilation Manager EXM Runtime Manager

Runtime Issues • Compilation Issues • Executables must be prepared to maximize scheduling flexibility • Compilations must be scheduled to maximize application performance and hardware utilization • Java?

Runtime Issues • Task Placement • The criteria for selecting machines to host tasks must consider both hardware utilization and application throughput • Hints supplied by the programmer might improve task placement decisions

Processor Utilization • Free Parallelism • Parallel applications with low efficiency benefit when run on idle machines • Anticipatory Processing • Use idle resources to perform work which may be useful if certain schedules are ultimately executed

Load Balancing • Central issue in the execution module • Good application throughput must be achieved without impacting interactive users • Many systems provide the ability to migrate tasks

Task Migration • Various migration strategies are possible • Redundant execution • Check-pointing • Dump and migrate • Recompilation • Byte coded tasks

Systems • Many systems are available which provide some form of a VCE • PVM • MPI • Beowulf • Condor • …

The Berkeley Now Project

Condor • Condor is a software system that runs on a cluster of workstations to harness wasted CPU cycles. • A Condor pool consists of any number of machines, of possibly different architectures and operating systems, that are connected by a network • To monitor the status of the individual computers in the cluster, Condor "daemons" must run all the time. • One daemon is called the "master". Its only job is to make sure that the rest of the Condor daemons are running.

Idle Machines Only • Two other daemons run on every machine in the pool: startd and schedd • Startd monitors information about the machine that is used to decide if it is available to run a Condor job • keyboard and mouse activity • load on the CPU • startd also notices when a user returns to a machine that is currently running and removes the job.

Condor Architecture

Condor Executables • Code does not have to be modified in any way to be used in Condor • it must be linked with the Condor libraries • Once re-linked, jobs gain two crucial abilities: • Checkpoint • Perform remote system calls • Condor also provides a mechanism to run binaries that have not been re-linked, which are called "vanilla" jobs

Condor Executables

Condor Tricks • Match Making • When a task is submitted to Condor, the system finds a machine that matches the resources required by the task • Condor uses check-pointing to migrate jobs • You only loose the computation that has been performed since the last checkpoint • Condor tasks move around to find the under utilized workstations

Beowulf • The Beowulf parallel workstation is a single user multiple computer with direct access keyboard and monitors. Beowulf comprises: • 16 motherboards with Intel x86 processors • 256 Mbytes of DRAM, 16 MByte per processor board • 16 hard disk drives and controllers • 2 Ethernets and controllers per processor • 2 high res monitors with controllers and 1 keyboard • The Beowulf architecture is a fully COTS (Commodity Off The Shelf) configured system.

Parallel Computing