High-Performance Computing in Bioinformatics: Supercomputers vs. Clusters
Explore the nuances of high-performance computing (HPC) in bioinformatics as presented by Nick Lindberg at the MCW Bioinformatics User Group. Discover the differences between supercomputers, like IBM’s Blue Gene and NCSA’s Blue Waters, and HPC clusters made up of commodity servers. Supercomputers offer powerful, singular computing capabilities, while clusters excel in parallel tasks, suited for bioinformatics workloads. Learn about job submission, data processing, and how these systems transform raw data into actionable knowledge through advanced computing techniques.
High-Performance Computing in Bioinformatics: Supercomputers vs. Clusters
E N D
Presentation Transcript
High Performance Computing & Bioinformatics PerformanceNick Lindberg MCW Bioinformatics User Group
“If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens?”-Seymour Cray MCW Bioinformatics User Group
HPC: Computing Platforms • Supercomputer (Oxen) • Fast, expensive, custom CPU/memory architecture and high-speed interconnect facilitating parallelization of a single computation • Appears as a singular, large computer • IBM “Bluegene” • NCSA “Blue Waters” • HPC Cluster (Chickens) • A lot of cheaper, commodity servers/CPUs with loosely coupled, slower (but still fast) interconnect (Ethernet/Infiniband) • Excels at ‘embarrassingly parallel’ batch computing as well as MPI-enabled parallel computing • Marquette’s “Pere” • MKEI’s “hpc01” MCW Bioinformatics User Group
Matrix Multiplication C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] + A[0,2]*B[2,0] C[0,0] A[0,o] A[0,1] A[0,2] B[0,0] B[1,0] For (i=0; i<3; i++) { C[i, j] = A[i, k] * B[k, j] B[2,0] MCW Bioinformatics User Group
HPC Cluster Job/Software Stack Internet (Remote Users) Head/Login Node [Job Submission] Scheduler (Torque/Moab) R Bowtie2 Matlab 1 GigE/Infiniband Internode Communication Compute Nodes /scratch Shared Storage: Home Directories, Software, Scratch /usr/home /scratch
Data to Information: Where HPC Fits In Raw Data Target Data Processed Data Transformed Information Patterns Knowledge Pattern Recognition Data Processing Interpretation Sampling on customer devices, servers, end points Feature extraction and content resolution, database construction/storing Dimension reduction, matrix/vector building, pre-process formatting Association classifications, algorithm application, simulation/modeling Visualization and validation, model reconstruction, feedback