High Performance Computing & Bioinformatics Performance Nick Lindberg

High Performance Computing & Bioinformatics PerformanceNick Lindberg MCW Bioinformatics User Group

“If you were plowing a field, which would you rather use: two strong oxen or 1024 chickens?”-Seymour Cray MCW Bioinformatics User Group

HPC: Computing Platforms • Supercomputer (Oxen) • Fast, expensive, custom CPU/memory architecture and high-speed interconnect facilitating parallelization of a single computation • Appears as a singular, large computer • IBM “Bluegene” • NCSA “Blue Waters” • HPC Cluster (Chickens) • A lot of cheaper, commodity servers/CPUs with loosely coupled, slower (but still fast) interconnect (Ethernet/Infiniband) • Excels at ‘embarrassingly parallel’ batch computing as well as MPI-enabled parallel computing • Marquette’s “Pere” • MKEI’s “hpc01” MCW Bioinformatics User Group

Matrix Multiplication C[0,0] = A[0,0]*B[0,0] + A[0,1]*B[1,0] + A[0,2]*B[2,0] C[0,0] A[0,o] A[0,1] A[0,2] B[0,0] B[1,0] For (i=0; i<3; i++) { C[i, j] = A[i, k] * B[k, j] B[2,0] MCW Bioinformatics User Group

HPC Cluster Job/Software Stack Internet (Remote Users) Head/Login Node [Job Submission] Scheduler (Torque/Moab) R Bowtie2 Matlab 1 GigE/Infiniband Internode Communication Compute Nodes /scratch Shared Storage: Home Directories, Software, Scratch /usr/home /scratch

Data to Information: Where HPC Fits In Raw Data Target Data Processed Data Transformed Information Patterns Knowledge Pattern Recognition Data Processing Interpretation Sampling on customer devices, servers, end points Feature extraction and content resolution, database construction/storing Dimension reduction, matrix/vector building, pre-process formatting Association classifications, algorithm application, simulation/modeling Visualization and validation, model reconstruction, feedback

High Performance Computing & Bioinformatics Performance Nick Lindberg