1 / 62

A Big Data Spreadsheet

A Big Data Spreadsheet. Mihai Budiu – VMware Research Group (VRG) Universitatea Politehnica Bucuresti – July 31, 2018 Joint work with Parikshit Gopalan, Lalith Suresh, Udi Wieder, Marcos Aguilera – VMWare Research Han Kruiger – University of Groningen, intern at VRG. About Myself.

lavonn
Télécharger la présentation

A Big Data Spreadsheet

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Big Data Spreadsheet Mihai Budiu – VMware Research Group (VRG) UniversitateaPolitehnicaBucuresti – July 31, 2018 Joint work with Parikshit Gopalan, Lalith Suresh, Udi Wieder, Marcos Aguilera – VMWare Research Han Kruiger – University of Groningen, intern at VRG

  2. About Myself • B.S., M.S. from PolitehnicaBucuresti • Ph.D. from Carnegie Mellon • Researcher at Microsoft Research, Silicon Valley • Distributed systems, security, compilers, cloud platforms, machine learning, visualization • Software engineer at Barefoot Networks • Programmable networks (P4) • Researcher at VMware Research Group • Big data, programmable networks

  3. & VRG • VMware: • ~20K employees • Founded 1998 by Stanford Faculty; hq in Palo Alto, CA • Virtualization, networking, storage, security, cloud management • 7.92 billion USD annual revenue; valuation of 60 billion • VRG (VMware Research Group): • Founded 2014 • About 30 full-time researchers • Labs in Palo Alto and Herzliya, Israel • Distributed systems, networking, OS, formal methods, computer architecture, FPGAs, compilers, algorithms

  4. Browsing big data • Interactive visualization of billion row datasets • Slice and dice data with the click of a mouse • No coding required • http://github.com/vmware/hillview • Apache 2 license

  5. History

  6. Bandwidth hierarchy These channels are lossy! Compute approximate data views with an error < channel error.

  7. Demo • Real-time video • Browsing a 129 million • .5M flights/month • All US flights in last 21 years • Public data from FAA website • Running on 20 small VMs(25GB of RAM, 4 CPU cores each) • https://1drv.ms/v/s!AlywK8G1COQ_jeRQatBqla3tvgk4FQ

  8. Outline • Motivation • Fundamental building blocks • System architecture • Visualization as sketches • Evaluation • Conclusion

  9. Hillview Building Blocks

  10. Monoids A set M with • An operation + : M x M -> M • A distinguished zero element: a + 0 = 0 + a = a • Commutative if a + b = b + a interfaceIMonoid<R> { R zero(); R add(R left, R right); }

  11. Abstract Computational Model Output O Post processing “sketch” R R must be “small” (independent on N, dependent on screen size) add R R add add R R R R Streaming/samplingalgorithm sketch sketch sketch sketch Input data, sharded Multi-set of N tuples

  12. All Renderings in Hillview are sketches!

  13. System architecture

  14. Hillview System architecture In-memory table Leaf node parallel read In-memory table Leaf node Aggregation node Root node Webfront-end In-memory table Storage In-memory table Leaf node request In-memory table Storage In-memory table Leaf node Remotetables refs. streaming response In-memory table Aggregation node Storage Redo log In-memory table Client web browser Storage Aggregationnetwork Cloud service workers

  15. Immutable Partitioned Objects browser handle root node Address spaces Network IDataSet<T> Workers T T T T T T T T T T T

  16. DataSet Core API interfaceISketch<T,R> extendsIMonoid<R> { R sketch(T data);} interface PR<T> { // Partial result T data; double done;} interfaceIDataSet<T> { Observable<PR<R>> sketch(ISketch<T,R> sk); … map(…); … zip(…);}

  17. Dataset objects • Implement IDataSet<T> • Identical interfaces on top and bottom • Can be stacked arbitrarily • Modular construction of distributed systems IDataSet interface

  18. LocalDataset<T> Local • Contains a reference to an object of type Tin the same address space • Directly executes operations (map, sketch, zip) on object T

  19. ParallelDataset<T> Parallel • Has a number of children of type IDataSet<T> • Dispatches operations to all children • sketch adds the results of children

  20. RemoteDataset<T> Remote • Has a reference to an IDataSet<T>in another address space • The only component that deals with the network • Built on top of GRPC Client GRPC ref Server

  21. A distributed dataset Remote Remote Parallel Parallel Parallel Local Local Local Local Parallel Local Local Parallel Remote Root node Network worker 0 worker n worker 1 ref ref ref T T T T T T Rack 0 Rack r

  22. sketch(s) interfaceISketch<T,R> extendsIMonoid<R> { R sketch(T data);} Remote Local Local Parallel R s.add R R R s.sketch T T

  23. Memory management Root node Webfront-end Leaf node In-memory table Remotetables refs. Memoization cache Redo log In-memory table Storage Soft state (cache) • Log = lineage of all datasets • Log = JSON messages received from client • Replaying the log reconstructs all soft-state • Log can be replayed as needed

  24. Visualization as sketches

  25. Table views • Always sorted • NextK(startTuple, sortOrder, K) • Monoid operation is “merge sort”

  26. Scrolling Compute startTuple based on scroll-bar position • Approximate quantile • Samples O(H2) rows; H = screen height in pixels

  27. 1D Histograms CDF • Histograms are monoids (vector addition) • CDFs are histograms (at the pixel level)

  28. Histograms based on sampling Exact histogram Approximate histogram μ < 1/2 Actual Legal rendering Actual Legal rendering Legal rendering pixel row Theorem: O((HB / μ)2 log(1/δ)) samples are needed to computean approximate histogram with probability 1 – δ. H = screen size in pixels B = number of buckets (< screen width in pixels) No N in this formula!

  29. 2D Histograms

  30. Heatmaps Linear regression

  31. Trellis plots

  32. Evaluation

  33. Evaluation system TOR switch Web front-end LAN • 8 servers • Intel Xeon Gold 5120 2.2GHz (2 sockets x 14 cores x 2 hyperthreads) • 128GB RAM/machine • Ubuntu Server • 10Gbps Ethernet in rack • 2 SSDs/machine Root aggregator Worker client … Worker Worker rack

  34. Comparison against database • Commercial database [can’t tell which one] • In-memory table • 100M rows • DB takes 5,830ms (Hillview is 527ms)

  35. Cluster-level weak scaling Histogram, 100M elements/shard, 64 shards/machine Computation gets faster as dataset size grows!

  36. Comparison against Spark • 5x the flights dataset(71B cells) • Spark times do not include UI rendering

  37. Scaling data ORC data files, includes I/O and rendering

  38. Conclusion

  39. Lessons learned • Always think asymptotics (data size/screen size) • Define “correct approximation” precisely • Small renderings make browser fast • Two kinds of visualizations: trends and outliers • Don’t forget about missing data! • Sampling is not free; full scans may be cheaper • Redo log => simple memory management

  40. Related work [a small sample] • Big data visualization • Databases, commercial productsPolaris/Tableau, IBM BigSheets, inMens, Nanocubes, Hashedcubes, DICE, Erma, Pangloss, Profiler, Foresight, iSAX, M4, Vizdom, PowerBI • Big data analytics for visualization • MPI Reduce, Neptune, Splunk, MapReduce, Dremel/BigQuery, FB Scuba, Drill, Spark, Druid, Naiad, Algebird, Scope, ScalarR • Sampling-based analytics • BlinkDB, VisReduce, Sample+Seek, G-OLA • Incremental visualization • Online aggregation, progressive analytics, ProgressiVis, Tempe, Stat, SwiftTuna, MapReduce online, EARL, Now!, PIVE, DimXplorer

  41. Backup slides

  42. Linear transformations (homomorphisms) • Linear functions between monoids:f: M → N, f(a + b) = f(a) + f(b) • “Map” and “reduce” are linear functions • Linear transformations are the essence of data parallelism • Many streaming algorithms are linear transformations

  43. Pixel-level quantization

  44. Reactive streams (RxJava) interface Observable<T> {   Subscription subscribe(Observer<T> observer); } interface Observer<T> { voidonNext(T value); voidonError(Throwable error); voidonCompleted(); } March 2012

  45. Histogram execution timeline Progressreport Render Datarange Initiate histogram User click Completed Client Web server Datarange Histogram + CDF Worker 1 Time Compute Worker n Full scan Sampled scan

  46. Observable roles (1) Streaming data Observable<R> partialResults;

  47. Observable roles (2) Distributed progress reporting class PR<T> { // A partial result double done; T data; } Observable<PR<R>> partialResults;

  48. Observable roles (3) Distributed cancellation Sketch API (C#) async R map(Func<T, R> map, CancellationToken t, Progress<double> rep) CancellationTokenct = cancellationTokenSource.token;Progress<double> reporter; R result = data.map(f, ct, reporter);… cancellationTokenSource.cancel(); Hillview API Observable<PR<R>> map(Function<T, R> map); Observable<PR<R>> o = data.map(f); Subscription s = o.subscribe(observer); … s.unsubscribe();

  49. Observable roles (4) Concurrency management Observable<T> data; Observable<T> output = data.subscribeOn(scheduler);

More Related