An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one)

An Apples-to-Apples GPGPU Benchmark(…or at least an attempt at one) Peter S. Shenkin

Attachment-Based Core Hopping • What it does • The architecture • The benchmark

Attachment-Based Core Hopping • What it does • Find a replacement for the central portion of a molecule • … keeping the peripheral parts in place • … while making “chemical sense” • Why would you do such a thing? • Increase efficacy • Improve “ADMET” properties • (Absorption, Distribution, Metabolism, Excretion, Toxicity) • Find new IP • Designed as a fast interactive desktop application • The architecture • The benchmark

Define Core in a “Template” Molecule • Two ways shown, to emphasize user choice 1kv1 core “1kv1-smaller” core

Result: 1err: olap= 0.95 relgscore= -1.37 • Replaced C with N • Replaced S with C

Result: 1erb: olap= 0.80, relgscore= -0.96 • Spiro core!

Result: 1kv2: olap= 0.29, relgscore= -0.37 • Replaced O with N • Replaced N with C • Added an N • Huge shape difference!

Attachment-Based Core Hopping • What it does • The architecture • Workflow engine independent of application code • (… and APU technology) • Multithreaded using Qthreads; C++ • Application stages are essentially plug-ins • The benchmark

Architecture Legend Non-thread-safethread Thread-safethread CUDAthread I O Queue Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 Scheduler

Attachment-Based Core Hopping • What it does • The architecture • The benchmark • A truism that goes without saying • Results slowly unveiled • The dilemma & its resolution • Did we “do the right thing”?

The Truism • There are lies…

The Truism • There are lies… • … damn lies

The Truism • There are lies… • … damn lies • … statistics

The Truism • There are lies… • … damn lies • … statistics • … benchmarks

The Truism • There are lies… • … damn lies • … statistics • … benchmarks • … salesmen’s claims

The Truism • There are lies… • … damn lies • … statistics • … benchmarks • … salesmen’s claims … and the last two all too often interact

Results Test system: • i7/930, 2.7 GHz processor • 4 physical cores, run hyperthreaded • 12 Gb RAM • 8-lane PCIe motherboard • SSD drive

Results

Results At constant CPU utilization: • With two GPGPUs: • Speedup = 1.07 / 0.3275 = 3.3 • With one GPGPU: • Speedup = 0.76 / 0.20 = 3.8

Closing Remarks • If we did our comparisons with different number of threads, speedups would be different • If we worked on a machine with more or fewer processors, speedups would be different • If we used an 4-lane PCIe motherboard, or a different CPU, or a slower hard drive, speedups would be different • If our software architecture were different, speedups would be different • Conclusion from above: The world is a complicated place • Do you agree that our approach is fair?

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one)

An Apples-to-Apples GPGPU Benchmark (…or at least an attempt at one)

Presentation Transcript

Eighth Grade English Benchmark Period Two Grammar, Usage, and Mechanics PASS Standards and Objectives

View of Bald Knob Cross and the Shawnee National Forest

GPGPU Programming

Benchmark assessment of numeracy for nursing: Medication dosage calculation at point of registration

AIMSweb as a Program Evaluation Tool: Core Academic Areas

WELCOME TO

Spring Benchmark Review

Benchmark 2 Review

AIM: How did Charlemagne attempt to recreate the Roman Empire

Post-Impressionism

Control Statements: Part 1

Econ 240 C

Benchmark Advisory Test (BAT) Update

11 th Edition Chapter 11

4 th Grade Word Problem of Day

SEASONS OF TREES BY JULIE HOLDER

Econ 240 C

Flexible Budgets and Overhead Analysis

2009 Rheumatology Economic Survey

Chapter Eleven

BENCHMARK #4 REVIEW