410 likes | 510 Vues
Concurrency Idea. Concurrency idea. Challenge Print primes from 1 to 10 10 Given Ten-processor multiprocessor One thread per processor Goal Get ten-fold speedup (or close). 2. Load Balancing. Split the work evenly Each thread tests range of 10 9. 1. 10 10. 10 9. 2·10 9. …. ….
E N D
Concurrency idea Challenge Print primes from 1 to 1010 Given Ten-processor multiprocessor One thread per processor Goal Get ten-fold speedup (or close) 2
Load Balancing Split the work evenly Each thread tests range of 109 1 1010 109 2·109 … … P0 P1 P9 3
Procedure for Thread i void primePrint { int i = ThreadID.get(); // IDs in {0..9} for(j = i*109+1, j<(i+1)*109; j++) { if(isPrime(j)) print(j); } } 4
Issues Higher ranges have fewer primes Yet larger numbers harder to test Thread workloads Uneven Hard to predict 5
Issues Higher ranges have fewer primes Yet larger numbers harder to test Thread workloads Uneven Hard to predict Need dynamic load balancing rejected 6
Shared Counter 19 each thread takes a number 18 17 7
Procedure for Thread i int counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } } 8
Procedure for Thread i Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } } Shared counter object 9
Where Things Reside cache cache cache Bus Bus void primePrint { int i = ThreadID.get(); // IDs in {0..9} for(j = i*109+1, j<(i+1)*109; j++) { if(isPrime(j)) print(j); } } Local variables code shared memory 1 shared counter 10
Procedure for Thread i Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } } Stop when every value taken 11
Procedure for Thread i Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j =counter.getAndIncrement(); if (isPrime(j)) print(j); } } Increment & return each new value 12
Counter Implementation public class Counter{ private long value; public long getAndIncrement() { return value++; } } 13
Counter Implementation public class Counter { private long value; public long getAndIncrement() { return value++; } } OK for single thread, not for concurrent threads 14
What It Means public class Counter { private long value; public long getAndIncrement() { return value++; } } 15
What It Means public class Counter { private long value; public long getAndIncrement() { return value++; } } temp = value; value = temp + 1; return temp; 16
Not so good… time Value… 1 2 3 2 read 1 write 2 read 2 write 3 read 1 write 2 17
Is this problem inherent? !! !! write read read write If we could only glue reads and writes together… 18
Challenge public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } } 19
Challenge public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } } Make these steps atomic (indivisible) 20
Hardware Solution public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } } ReadModifyWrite() instruction 21
An Aside: Java™ public class Counter { private long value; public long getAndIncrement() { synchronized{ temp = value; value = temp + 1; } return temp; } } 22
An Aside: Java™ public class Counter { private long value; public long getAndIncrement() { synchronized{ temp = value; value = temp + 1; } return temp; } } Synchronized block 23
An Aside: Java™ public class Counter { private long value; public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; } } Mutual Exclusion 24
Why do we care? We want as much of the code as possible to execute concurrently (in parallel) A larger sequential part implies reduced performance Amdahl’s law: this relation is not linear… 25
Amdahl’s Law Speedup= …of computation given nCPUs instead of 1 26
Amdahl’s Law Speedup= 27
Amdahl’s Law Parallel fraction Speedup= 28
Amdahl’s Law Sequential fraction Parallel fraction Speedup= 29
Amdahl’s Law Sequential fraction Parallel fraction Speedup= Number of processors 30
Example Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup? 31
Example Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup? Speedup = 2.17= 32
Example Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup? 33
Example Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup? Speedup = 3.57= 34
Example Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup? 35
Example Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup? Speedup = 5.26= 36
Example Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup? 37
Example Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup? Speedup = 9.17= 38
Back to Real-World Multicore Scaling Speedup 2.9x 2x 1.8x User code Multicore Not reducing sequential % of code 40
Fine grained parallelism has huge performance benefit The reason we get only 2.9 speedup c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c Shared Data Structures Fine Grained Coarse Grained 25% Shared 25% Shared 75% Unshared 75% Unshared
Multiprocessor Programming This is what this course is about… The % that is not easy to make concurrent yet may have a large impact on overall speedup 43