multiple processor systems n.
Skip this Video
Loading SlideShow in 5 Seconds..
Multiple Processor Systems PowerPoint Presentation
Download Presentation
Multiple Processor Systems

Multiple Processor Systems

149 Vues Download Presentation
Télécharger la présentation

Multiple Processor Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Multiple Processor Systems

  2. Multiprocessor Systems • Continuous need for faster and powerful computers • shared memory model ( access nsec) • message passing multiprocessor (access microsec) • wide area distributed system (access msec) Multiprocessor Multicomputer Distributed System

  3. Multiprocessor Definition:A computer system in which two or more CPUs share full access to a common RAM

  4. Multiprocessor Hardware (1) Bus-based multiprocessors memory coherence

  5. Multiprocessor Hardware (2) • UMA (Uniform Memory Access) Multiprocessor using a crossbar switch (n*n crosspoints)

  6. Multiprocessor Hardware (3) • UMA multiprocessors using multistage switching networks can be built from 2x2 switches (a) 2x2 switch (b) Message format

  7. Multiprocessor Hardware (4) • Omega Switching Network (n/2 * ln2 n switches)

  8. Multiprocessor Hardware (5) NUMA Multiprocessor Characteristics • Single address space visible to all CPUs • Access to remote memory via commands • LOAD • STORE • Access to remote memory slower than to local Performance worse than in UMA machines

  9. Multiprocessor OS Types (1) Each CPU has its own operating system System calls caught and handle on its own CPU No sharing of process No sharing of pages Multiple independent buffer caches Bus

  10. Multiprocessor OS Types (2) Master-Slave multiprocessors Master is a bottleneck It fails for large multiprocessors Bus

  11. Multiprocessor OS Types (3) • SMP - Symmetric Multi-Processor Only one CPU at a time can run the operating system  operating system split in critical regions Bus

  12. Multiprocessor Synchronization (1) TSL (test and set lock) instruction can fail if bus already locked TSL must first lock the bus Special bus needed

  13. Multiprocessor Synchronization (2) Multiple locks used to avoid cache thrashing

  14. Multiprocessor Synchronization (3) Spinning versus Switching • In some cases CPU must wait • waits to acquire ready list • In other cases a choice exists • spinning wastes CPU cycles • switching uses up CPU cycles also • possible to make separate decision each time locked mutex encountered

  15. Multiprocessor Scheduling • Scheduling on a single processor is one dimensional (process) • Scheduling on a multiprocessor is two dimensional (process & CPU) • Unrelated processes • Related in groups processes

  16. Multiprocessor Scheduling (1)independent processes • Pure Timesharing: use of singledata structure for scheduling • Affinity scheduling 2 level algorithm: process assigned at a CPU when created each CPU has its own priority list, but if idle it takes a process from another CPU

  17. Multiprocessor Scheduling (2)related processes • Space sharing: multiple processes or threads at the same time across multiple CPUs (non multi-programmed) • Partitions size fixed or dynamically modified

  18. Multiprocessor Scheduling (3)time and space sharing together: • Problems with communication between two threads (A0 A1, i.e.) • both belong to process A • both running out of phase

  19. Multiprocessor Scheduling (4) • Solution: Gang Scheduling • Groups of related threads scheduled as a unit (a gang) • All members of a gang run simultaneously on different timeshared CPUs • All gang members start and end time slices together All CPUs scheduled synchronously. Time divided into quanta

  20. Multiprocessor Scheduling (5) Gang Scheduling 6 CPU, 5 processes (A-E) In principle, all threads of the same process run together

  21. Multicomputers • Definition:Tightly-coupled CPUs that do not share memory • Also known as • cluster computers • clusters of workstations (CoWs, Farms) Interconnection network is crucial

  22. Interconnection topologies (a) single switch (b) ring (c) grid (mesh) (d) double torus (e) cube (f) hypercube Multicomputer Hardware (1)

  23. Multicomputer Hardware (2) • Switching scheme • store-and-forward packet switching Packet switching vs Circuit switching

  24. Multicomputer Hardware (3) Network interface boards in a multicomputer DMA (and CPU) increase efficiency Mapping of the interface board directly into user space speeds up, but...

  25. Low-Level Communication Software (1) Problems: • If several processes, running on node, need network access to send packets …? One can map the interface board to all process that need it but a synchronization mechanism is needed (difficult for multiprocessing) • If kernel needs access to network …? We can use two network boards • One mapped into user space, one into kernel space • DMA uses physical addr, user process virtual addr. it is needed to handle the problem without system calls

  26. Low-Level Communication Software (2) Node to Network Interface Communication, with on-board CPU • Use send & receive rings, with bitmap • coordinates main CPU with on-board CPU

  27. User Level Communication Software (a) Blocking send call • Minimum services: sendand receivecommands • These are blocking(synchronous) calls • CPU idle during transmission • Non blocking calls (asynchronous) • with copy • with interrupt • copy on write (b) Non blocking send call with kernel copy

  28. Remote Procedure Call (1) Message passing is based on I/O RPC allows programs to call procedures located on different CPU. RPC looks like a local call • Steps in making a remote procedure call

  29. Remote Procedure Call (2) Implementation Issues • Cannot pass pointers • call by reference becomes copy-restore (but might fail) • Weakly typed languages • client stub cannot determine size • Not always possible to determine parameter types • Cannot use global variables • may get moved to remote machine

  30. Distributed Shared Memory (1) • Note layers where it can be implemented • hardware • operating system • user-level software

  31. Distributed Shared Memory (2) Replication (a) Pages distributed on 4 machines (b) CPU 0 references page 10 (c) CPU 1 reads (only) page 10 (replicated)

  32. Distributed Shared Memory (3) DSM uses multiple of page size: page size is important False Sharing Must also achieve sequential consistency (write on replicated page)

  33. Multicomputer SchedulingLoad Balancing (1) Each process can only run on the CPU where it is located on. Choice parameters : CPU & mem. usage, comm. needs etc • Graph-theoretic deterministic algorithm Process

  34. Load Balancing (2) overloaded sender • Sender-initiated distributed heuristic algorithm Processes run on the cpus that have created them, unless overloaded

  35. Load Balancing (3) Under-loaded receiver • Receiver-initiated distributed heuristic algorithm