1 / 11

THE INDUSTRIAL REVOLUTION OF DATA

revsorg (FLICKR). THE INDUSTRIAL REVOLUTION OF DATA. logs sensors cameras gps ... . CRISIS (?) IN COMPUTING. moore’s law, derailed processor clock rates have stopped growing storage, memory, network continue exponentiating solution (?): multicore many processors on a single chip

banagher
Télécharger la présentation

THE INDUSTRIAL REVOLUTION OF DATA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. revsorg (FLICKR) THE INDUSTRIAL REVOLUTION OF DATA • logs • sensors • cameras • gps • ...

  2. CRISIS (?) IN COMPUTING • moore’s law, derailed • processor clock rates have stopped growing • storage, memory, network continue exponentiating • solution (?): multicore • many processors on a single chip • massive parallelism (for the masses)

  3. CRISIS AGAIN! • can’t clean up SW that’s already out • future does not look pretty either • few developers can produce good parallel code

  4. HEARD THIS BEFORE? • “dead parallel computer society” • Convex, Encore, Floating Point Systems, INMOS, Kendall Square Research, MasPar, nCUBE, Sequent, Thinking Machines... • shared-nothing databases • Gamma, Bubba, Teradata

  5. NASA vs. FEDEX

  6. DATAFLOW PARALLELISM • split up a large set of inputs. not the algorithm. • SQL (IBM, 1974) • widely adopted in enterprises • MapReduce (Google, 2002) • widely adopted by hackers, students, algorithmicists • very, very similar master QD processslice 3 QE processslice 2 QE processslice 2 QE processslice 2 QE processslice 1 QE processslice 1 QE processslice 1 segment1 segment2 segment3

  7. SQL OR MAP/REDUCE? • yes. • tradeoffs in programmability/usability • compatibility • cultural diversity • mix and match • see above

  8. FROM FLEETS TO SPACESHIPS • alas, parallel dataflow only works for data • wait ... nearly everything is data!

  9. RETHINKING PROGRAMMING • data-centric, declarative, distributed • NWing, robotics, machine learning, NLP, games, cloud infrastructure...

  10. RESEARCH: LINCOLN& BOOM • lincoln • a data-centric language for parallel computing • the cloud goes BOOM! • Berkeley Orders of Magnitude • OOM bigger systems,OOM less code • distributed filesystem, parallel dataflow infrastructure • built out of dataflow! • evolution: incremental ‘ilities with minimal effort

More Related