1 / 28

Parallel Programming on the SGI Origin2000

Parallel Programming on the SGI Origin2000. Taub Computer Center Technion. Moshe Goldberg, mgold@tx.technion.ac.il. With thanks to Igor Zacharov / Benoit Marchand, SGI. Mar 2004 (v1.2). Parallel Programming on the SGI Origin2000. Parallelization Concepts SGI Computer Design

hagop
Télécharger la présentation

Parallel Programming on the SGI Origin2000

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming on the SGI Origin2000 Taub Computer Center Technion Moshe Goldberg, mgold@tx.technion.ac.il With thanks to Igor Zacharov / Benoit Marchand, SGI Mar 2004 (v1.2)

  2. Parallel Programming on the SGI Origin2000 • Parallelization Concepts • SGI Computer Design • Efficient Scalar Design • Parallel Programming -OpenMP • Parallel Programming- MPI

  3. 2) SGI Computer Design

  4. Origin2000/3000 architecture features Important hardware and software components: * node board: processors + memory * node interconnect topology and configurations * scalability of the architecture * directory-based cache coherency * single system image components

  5. Origin2000 node board

  6. Origin node board HUB crossbar ASIC: - Single chip integrates all four functions: * processor interface: two rxK processors on the same bus * memory interface, integrating the memory controller and (direct) cache coherency * interface to CrayLink Interconnect to other nodes in the system * interface to I/O defices with XIO-to-PCI bridges - Memory access characteristics: * read bandwidth single processor 460 MB/s sustained * average access latency 315 ns to restart processor pipeline

  7. Origin2000 node components

  8. Origin router interconnect - Router chip has 6 CrayLink interfaces: 2 for connections to nodes (HUBs) and 4 for connections to other routers in the network * 4-dimensional interconnect - The interconnect topology is determined by the size of the computer (number of nodes): * direct (back-to-back) connection for 2 nodes (4 cpu) * strongly connected cube up to 32 cpu * hypercube for up to 64 cpu * hypercube of hypercubes for up to 256 cpu

  9. Origin2000 – two nodes

  10. Origin2000 module connections

  11. Origin2000 interconnect

  12. Origin2000 interconnect 32 processors 64 processors

  13. Origin2000 interconnect

  14. Directory-based uniform cache Cache line use is recorded in directory, which resides in memory

  15. Origin cache coherence - Memory page is divided in data blocks of 32 words or 128 bytes each (L2 cache line size) - Each data request transfers one data block (128 bytes) - Each data block has associated presence and state information directory memory . . . . . . . . . . . . presence state 64 bits 3 bits data block (cache line) 128 bytes (32 words) - If a node (HUB) requests a data block, the corresponding presencebit is set and the state of that cache line is recorded - HUB runs the cache coherence protocol, updating the state of the data block and notifying nodes for which the presence bit is set

  16. Origin address space - Physically the memory is distributed and not contiguous - Node id is assigned at boot time - Logically memory is a shared single contiguous address space, the virtual address space is 44 bits (16 TB) - A program (compiler) uses the virtual address space - CPU translates from virtual to physical address space 39 32 31 0 node id 8 bits Node offset 32 bits (4 GB) Empty slot page 0 1 2 n Physical k 1 n 0 Memory present 0 1 2 3 .. Node id Virtual TLB TLB – Translation Look-aside Buffer

  17. Summary: origin2000 properties - Single machine image * behaves like a large workstation * same compilers * time sharing * all SGI old code (binaries) will run * OS schedules the hardware resources on the machine - processor scalability 2-1024 cpu - I/O scalability - all memory and I/O devices are directly addressable * no limitations on the size of a single program, it can use all available memory * no limitations on the location of the data, all disks can be used in a single file system - 64 bit operating system and file system * HPC features: Checkpoint/restart, queueing system - machine stability

  18. Origin2000/3000 architecture goal Hardware design – distributed memory But: to a programmer – It looks like shared memory

  19. Example: Simple Memory Access

  20. Parix run limits (1) NQS queues on parix (2) Interactive Maximum cputime = 15 minutes

  21. Two ways to run a batch job (1) Parameters in command line (2) Parameters in script file

  22. QSUB options

  23. Output of command: “qstat –a”

  24. Exercise 1 – login and submit a job

More Related