1 / 63

ISCM-10

ISCM-10. High Performance Computing for Computational Mechanics. Moshe Goldberg March 29, 2001. Taub Computing Center. High Performance Computing for CM. Agenda:. Overview Alternative Architectures Message Passing “Shared Memory” Case Study. High Performance Computing - Overview.

puccio
Télécharger la présentation

ISCM-10

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ISCM-10 High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001 Taub Computing Center

  2. High Performance Computing for CM Agenda: • Overview • Alternative Architectures • Message Passing • “Shared Memory” • Case Study

  3. High Performance Computing • - Overview

  4. Some Important Points * Understanding HPC concepts * Why should programmers care about the architecture? * Do compilers make the right choices? * Nowadays, there are alternatives

  5. Trends in computer development *Speed of calculation is steadily increasing *Memory may not be in balance with high calculation speeds *Workstations are approaching speeds of especially efficient designs *Are we approaching the limit of the speed of light? * To get an answer faster, we must perform calculations in parallel

  6. Some HPC concepts * HPC * HPF / Fortran90 * cc-NUMA * Compiler directives * OpenMP * Message passing * PVM/MPI * Beowulf

  7. 2) Alternative Architectures

  8. Source: IDC, 2001

  9. Source: IDC, 2001

  10. IUCC (Machba) computers Origin2000 112 cpu (R12000, 400 MHz) 28.7 GB total memory PC cluster 64 cpu (Pentium III, 550 MHz) Total memory - 9 GB Cray J90 -- 32 cpu Memory - 4 GB (500 MW) Mar 2001

  11. Chris Hempel, hpc.utexas.edu

  12. Chris Hempel, hpc.utexas.edu

  13. Symmetric Multiple Processors Memory Memory Bus CPU CPU CPU CPU Examples: SGI Power Challenge, Cray J90/T90

  14. Distributed Parallel Computing Memory Memory Memory Memory CPU CPU CPU CPU Examples: SP2, Beowulf

  15. 3) Message Passing

  16. MPI commands -- examples call MPI_SEND(sum,1,MPI_REAL,ito,itag, MPI_COMM_WORLD,ierror) call MPI_RECV(sum,1,MPI_REAL,ifrom,itag, MPI_COMM_WORLD,istatus,ierror)

  17. Some basic MPI functions Setup: mpi_init mpi_finalize Environment: mpi_comm_size mpi_comm_rank Communication: mpi_send mpi_receive Synchronization: mpi_barrier

  18. Other important MPI functions Asynchronous communication: mpi_isend mpi_irecv mpi_iprobe mpi_wait/nowait Collective communication: mpi_barrier mpi_bcast mpi_gather mpi_scatter mpi_reduce mpi_allreduce Derived data types: mpi_type_contiguous mpi_type_vector mpi_type_indexed mpi_type_pack mpi_type_commit mpi_type_free Creating communicators: mpi_comm_dup mpi_comm_split mpi_intercomm_create mpi_comm_free

  19. 4) “Shared Memory”

  20. Fortran directives --examples CRAY: CMIC$ DO ALL do i=1,n a(i)=i enddo SGI: C$DOACROSS do i=1,n a(i)=i enddo OpenMP: C$OMP parallel do do i=1,n a(i)=i enddo

  21. OpenMP Summary OpenMP standard – first published Oct 1997 Directives Run-time Library Routines Environment Variables Versions for f77, f90, c, c++

  22. OpenMP Summary Parallel Do Directive c$omp parallel do private(I) shared(a) do I=1,n a(I)= I+1 enddo c$omp end parallel do  optional

  23. OpenMP Summary Defining a Parallel Region - Individual Do Loops c$omp parallel shared(a,b) c$omp do private(j) do j=1,n a(j)=j enddo c$omp end do nowait c$omp do private(k) do k=1,n b(k)=k enddo c$omp end do c$omp end parallel

  24. OpenMP Summary Parallel Do Directive - Clauses shared private default(private|shared|none) reduction({operator|intrinsic}:var) if(scalar_logical_expression) ordered copyin(var)

  25. OpenMP Summary Run-Time Library Routines Execution environment omp_set_num_threads omp_get_num_threads omp_get_max_threads omp_get_thread_num omp_get_num_procs omp_set_dynamic/omp_get_dynamic omp_set_nested/omp_get_nested

  26. OpenMP Summary Run-Time Library Routines Lock routines omp_init_lock omp_destroy_lock omp_set_lock omp_unset_lock omp_test_lock

  27. OpenMP Summary Environment Variables OMP_NUM_THREADS OMP_DYNAMIC OMP_NESTED

  28. Single CPU RISC memory levels CPU Cache Main memory

  29. RISC memory levels Single CPU CPU Cache Main memory

  30. RISC memory levels Multiple CPU’s CPU 0 Cache 0 CPU 1 Cache 1 Main memory

  31. Multiple CPU’s RISC memory levels CPU 0 Cache 0 CPU 1 Cache 1 Main memory

  32. RISC Memory Levels Multiple CPU’s CPU 0 Cache 0 CPU 1 Cache 1 Main memory

  33. A sample program subroutine xmult (x1,x2,y1,y2,z1,z2,n) real x1(n),x2(n),y1(n),y2(n),z1(n),z2(n) real a,b,c,d do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo end

  34. A sample program subroutine xmult (x1,x2,y1,y2,z1,z2,n) real x1(n),x2(n),y1(n),y2(n),z1(n),z2(n) real a,b,c,d c$omp parallel do do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo end

  35. A sample program Run on Technion origin2000 Vector length = 1,000,000 Loop repeated 50 times Compiler optimization: low (-O1) Elapsed time, sec threads Compile 1 2 4 No parallel 15.0 15.3 Parallel 16.0 26.0 26.8 Is this running in parallel?

  36. A sample program Run on Technion origin2000 Vector length = 1,000,000 Loop repeated 50 times Compiler optimization: low (-O1) Elapsed time, sec threads Compile 1 2 4 No parallel 15.0 15.3 Parallel 16.0 26.0 26.8 Is this running in parallel? WHY NOT?

  37. A sample program Is this running in parallel? WHY NOT? c$omp parallel do do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo Answer: by default, variables a,b,c,d are defined as SHARED

  38. A sample program Solution: define a,b,c,d as PRIVATE: c$omp parallel do private(a,b,c,d) Elapsed time, sec threads Compile 1 2 4 No parallel 15.0 15.3 Parallel 16.0 8.5 4.6 This is now running in parallel

More Related