1 / 99

Parallel-lazy performance Java 8 vs Scala vs GS Collections

Parallel-lazy performance Java 8 vs Scala vs GS Collections. Craig Motlin June 2014. Goals. Compare Java Streams, Scala parallel Collections, and GS Collections Convince you to use GS Collections Convince you to do your own performance testing Identify when to avoid parallel APIs

ada
Télécharger la présentation

Parallel-lazy performance Java 8 vs Scala vs GS Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel-lazy performanceJava 8 vs Scala vs GS Collections Craig Motlin June 2014

  2. Goals • Compare Java Streams, Scala parallel Collections, and GS Collections • Convince you to use GS Collections • Convince you to do your own performance testing • Identify when to avoid parallel APIs • Identify performance pitfalls to avoid

  3. Goals • Compare Java Streams, Scala parallel Collections, and GS Collections • Convince you to use GS Collections • Convince you to do your own performance testing • Identify when to avoid parallel APIs • Identify performance pitfalls to avoid Lots of claims and opinions

  4. Goals • Compare Java Streams, Scala parallel Collections, and GS Collections • Convince you to use GS Collections • Convince you to do your own performance testing • Identify when to avoid parallel APIs • Identify performance pitfalls to avoid Lots of evidence

  5. Goals • Compare Java Streams, Scala parallel Collections, and GS Collections • Convince you to use GS Collections • Convince you to do your own performance testing • Identify when to avoid parallel APIs • Identify performance pitfalls to avoid

  6. Intro • Solve the same problem in all three libraries • Java (1.8.0_05) • GS Collections (5.1.0) • Scala (2.11.0) • Count how many even numbers are in a list of numbers • Then accomplish the same thing in parallel • Data-level parallelism • Batch the data • Use all the cores

  7. Performance Factors Tests that isolate individual performance factors • Count • Filter, Transform, Transform, Filter, convert to List • Aggregation • Market value stats aggregated by product or category

  8. Count: Serial long evens = arrayList.stream() .filter(each -> each % 2 == 0).count(); intevens = fastList.count(each -> each % 2 == 0); valevens = arrayBuffer.count(_ % 2 == 0)

  9. Count: Serial Lazy long evens = arrayList.stream() .filter(each -> each % 2 == 0).count(); intevens = fastList.asLazy() .count(each -> each % 2 == 0); valevens = arrayBuffer.view .count(_ % 2 == 0)

  10. Count: Parallel Lazy long evens = arrayList.parallelStream() .filter(each -> each % 2 == 0).count(); intevens = fastList .asParallel(executorService, BATCH_SIZE) .count(each -> each % 2 == 0); valevens = arrayBuffer.par.count(_ % 2 == 0)

  11. Parallel Lazy Batch Filter and Count Reduce

  12. Parallel Eager Batch Filter Count Reduce

  13. Goals • Compare Java Streams, Scala parallel Collections, and GS Collections • Convince you to use GS Collections • Convince you to do your own performance testing • Identify when to avoid parallel APIs • Identify performance pitfalls to avoid Time for some numbers!

  14. Serial Count ops/s (higher is better)

  15. Parallel Count ops/s (higher is better) Measured on an 8 core Linux VM Intel Xeon E5-2697 v2 8x

  16. Java Microbenchmark Harness “JMH is a Java harness for building, running, and analysingnano/micro/milli/macro benchmarks written in Java and other languages targetting the JVM.” • 5 forked JVMs per test • 100 warmup iterations per JVM • 50 measurement iterations per JVM • 1 second of looping per iteration http://openjdk.java.net/projects/code-tools/jmh/

  17. Java Microbenchmark Harness @GenerateMicroBenchmark public void parallel_lazy_jdk() { long evens = this.integersJDK .parallelStream() .filter(each -> each % 2 == 0) .count(); Assert.assertEquals(SIZE / 2, evens); }

  18. Java Microbenchmark Harness • @Setup includes megamorphicwarmup • More info on megamorphic in the appendix • This is something that JMH does not handle for you!

  19. Java Microbenchmark Harness • Throughput: higher is better • Enough warmup iterations so that standard deviation is low Benchmark Mode Samples Mean Mean error Units CountTest.parallel_eager_gscthrpt 250 629.961 8.305 ops/s CountTest.parallel_lazy_gscthrpt 250 595.023 7.153 ops/s CountTest.parallel_lazy_jdkthrpt 250 415.382 7.766 ops/s CountTest.parallel_lazy_scalathrpt 250 331.938 2.141 ops/s CountTest.serial_eager_gscthrpt 250 115.197 0.328 ops/s CountTest.serial_eager_scalathrpt 250 91.167 0.864 ops/s CountTest.serial_lazy_gscthrpt 250 73.625 3.619 ops/s CountTest.serial_lazy_jdkthrpt 250 58.182 0.477 ops/s CountTest.serial_lazy_scalathrpt 250 84.200 1.033 ops/s...

  20. Performance Factors Tests that isolate individual performance factors • Count • Filter, Transform, Transform, Filter, convert to List • Aggregation • Market value stats aggregated by product or category

  21. Java Microbenchmark Harness • Performance tests are open sourced • Read them and run them on your hardware https://github.com/goldmansachs/gs-collections/

  22. Performance Factors Factors that may affect performance • Underlying container implementation • Combine strategy • Fork-join vs batching (and batch size) • Push vs pull lazy evaluation • Collapse factor • Unknown unknowns

  23. Performance Factors Factors that may affect performance • Underlying container implementation • Combine strategy • Fork-join vs batching (and batch size) • Push vs pull lazy evaluation • Collapse factor • Unknown unknowns Let’s look at reasons for the differences in count() Isolated by using array-backed lists. ArrayList, FastList, and ArrayBuffer Isolated because combination of intermediate results is simple addition.

  24. Count: Java 8

  25. Count: Java 8 implementation @GenerateMicroBenchmark public void serial_lazy_jdk() { long evens = this.integersJDK .stream() .filter(each -> each % 2 == 0) .count(); Assert.assertEquals(SIZE / 2, evens); }

  26. Count: Java 8 implementation filter(Predicate) .count() Instead of count(Predicate) @GenerateMicroBenchmark public void serial_lazy_jdk() { long evens = this.integersJDK .stream() .filter(each -> each % 2 == 0) .count(); Assert.assertEquals(SIZE / 2, evens); }

  27. Count: Java 8 implementation filter(Predicate) .count() Instead of count(Predicate) @GenerateMicroBenchmark public void serial_lazy_jdk() { long evens = this.integersJDK .stream() .filter(each -> each % 2 == 0) .count(); Assert.assertEquals(SIZE / 2, evens); } Is count()just incrementing a counter?

  28. Count: Java 8 implementation public final long count() { return mapToLong(e -> 1L).sum(); } public final long sum() { return reduce(0, Long::sum); } /** @since 1.8 */ public static long sum(long a, long b) { return a + b; }

  29. Count: Java 8 implementation this.integersJDK .stream() .filter(each -> each % 2 == 0) .mapToLong(e -> 1L) .reduce(0, Long::sum); this.integersGSC .asLazy() .count(each -> each % 2 == 0);

  30. Count: Java 8 implementation this.integersJDK .stream() .filter(each -> each % 2 == 0) .mapToLong(e -> 1L) .reduce(0, Long::sum); this.integersGSC .asLazy() .count(each -> each % 2 == 0); Seems like extra work

  31. Count: GS Collections

  32. Count: GS Collections @GenerateMicroBenchmark public void serial_lazy_gsc() { intevens = this.integersGSC .asLazy() .count(each -> each % 2 == 0); Assert.assertEquals(SIZE / 2, evens); }

  33. Count: GS Collections AbstractLazyIterable.java public intcount(Predicate<? super T> predicate) { CountProcedure<T> procedure = new CountProcedure<T>(predicate); this.forEach(procedure); return procedure.getCount(); }

  34. Count: GS Collections FastList.java public void forEach(Procedure<? super T> procedure) { for (inti= 0; i< this.size; i++) { procedure.value(this.items[i]); } }

  35. Count: GS Collections public class CountProcedure<T> implements Procedure<T> { private final Predicate<? super T> predicate; private intcount; ... public void value(T object) { if (this.predicate.accept(object)) { this.count++; } } public intgetCount() { return this.count; } }

  36. Count: GS Collections public class CountProcedure<T> implements Procedure<T> { private final Predicate<? super T> predicate; private intcount; ... public void value(T object) { if (this.predicate.accept(object)) { this.count++; } } public intgetCount() { return this.count; } } Predicate from the test: each -> each % 2 == 0

  37. Count: Scala

  38. Count: Scala implementation TraversibleOnce.scala defcount(p: A => Boolean): Int= { varcnt= 0 for (x <- this) if (p(x)) cnt+= 1 cnt }

  39. Count: Scala implementation TraversibleOnce.scala defcount(p: A => Boolean): Int= { varcnt= 0 for (x <- this) if (p(x)) cnt+= 1 cnt } for-comprehension becomes call to foreach() lambda closes over cnt. Executes predicate and increments cnt, just like CountProcedure

  40. Count: Scala implementation public final java.lang.Object apply(java.lang.Object); 0: aload_0 1: aload_1 // Method scala/runtime/BoxesRunTime.unboxToInt:(Ljava/lang/Object;)I 2: invokestatic #32 // Method apply:(I)Z 5: invokevirtual #34 // Method scala/runtime/BoxesRunTime.boxToBoolean:(Z)Ljava/lang/Boolean; 8: invokestatic #38 11: areturn public final boolean apply(int); 0: aload_0 1: iload_1 // Method apply$mcZI$sp:(I)Z 2: invokevirtual #21 5: ireturn public booleanapply$mcZI$sp(int); 0: iload_1 1: iconst_2 2: irem 3: iconst_0 4: if_icmpne 11 7: iconst_1 8: goto 12 11: iconst_0 12: ireturn

  41. Count: Scala implementation Integer int Integer.intValue() Lambda: _ % 2 == 0 Bytecode: irem boolean Boolean Boolean.valueOf(boolean)

  42. Performance Factors Factors that may affect performance • Underlying container implementation • Combine strategy • Fork-join vs batching (and batch size) • Push vs pull lazy evaluation • Collapse factor • Unknown unknowns Java’s pull lazy evaluation Scala’s auto-boxing

  43. Performance Factors Tests that isolate individual performance factors • Count • Filter, Transform, Transform, Filter, convert to List • Aggregation • Market value stats aggregated by product or category

  44. Parallel / Lazy / JDK List<Integer> list = this.integersJDK .parallelStream() .filter(each -> each % 10_000 != 0) .map(String::valueOf) .map(Integer::valueOf) .filter(each -> (each + 1) % 10_000 != 0) .collect(Collectors.toList()); Verify.assertSize(999_800, list);

  45. Parallel / Lazy / GSC MutableList<Integer> list = this.integersGSC .asParallel(this.executorService, BATCH_SIZE) .select(each -> each % 10_000 != 0) .collect(String::valueOf) .collect(Integer::valueOf) .select(each -> (each + 1) % 10_000 != 0) .toList(); Verify.assertSize(999_800, list);

  46. Parallel / Lazy / Scala vallist = this.integers .par .filter(each => each % 10000 != 0) .map(String.valueOf) .map(Integer.valueOf) .filter(each => (each + 1) % 10000 != 0) .toBuffer Assert.assertEquals(999800, list.size)

  47. Stacked computation ops/s(higher is better) 8x

  48. Parallel / Lazy / JDK List<Integer> list = this.integersJDK .parallelStream() .filter(each -> each % 10_000 != 0) .map(String::valueOf) .map(Integer::valueOf) .filter(each -> (each + 1) % 10_000 != 0) .collect(Collectors.toList()); Verify.assertSize(999_800, list);

  49. Parallel / Lazy / JDK List<Integer> list = this.integersJDK .parallelStream() .filter(each -> each % 10_000 != 0) .map(String::valueOf) .map(Integer::valueOf) .filter(each -> (each + 1) % 10_000 != 0) .collect(Collectors.toList()); Verify.assertSize(999_800, list); ArrayList::new List::add (left, right) -> { left.addAll(right); return left; }

  50. Fork-Join Merge • Intermediate results are merged in a tree • Merging is O(n log n) work and garbage

More Related