1 / 11

The Industrial Revolution of Data Crisis: Multicore Solution & Parallelism Derailed

Explore the crises in computing - halted processor clock rates, escalating storage, memory, network, and the limited ability to produce good parallel code. Discover the shift towards multicore solutions, shared-nothing databases, and the debate between SQL and MapReduce. From fleets to spaceships, rethink programming with a focus on data-centric, declarative, and distributed approaches. Delve into research initiatives like Lincoln and BOOM for parallel computing and scalable cloud infrastructure innovations.

banagher
Télécharger la présentation

The Industrial Revolution of Data Crisis: Multicore Solution & Parallelism Derailed

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. revsorg (FLICKR) THE INDUSTRIAL REVOLUTION OF DATA • logs • sensors • cameras • gps • ...

  2. CRISIS (?) IN COMPUTING • moore’s law, derailed • processor clock rates have stopped growing • storage, memory, network continue exponentiating • solution (?): multicore • many processors on a single chip • massive parallelism (for the masses)

  3. CRISIS AGAIN! • can’t clean up SW that’s already out • future does not look pretty either • few developers can produce good parallel code

  4. HEARD THIS BEFORE? • “dead parallel computer society” • Convex, Encore, Floating Point Systems, INMOS, Kendall Square Research, MasPar, nCUBE, Sequent, Thinking Machines... • shared-nothing databases • Gamma, Bubba, Teradata

  5. NASA vs. FEDEX

  6. DATAFLOW PARALLELISM • split up a large set of inputs. not the algorithm. • SQL (IBM, 1974) • widely adopted in enterprises • MapReduce (Google, 2002) • widely adopted by hackers, students, algorithmicists • very, very similar master QD processslice 3 QE processslice 2 QE processslice 2 QE processslice 2 QE processslice 1 QE processslice 1 QE processslice 1 segment1 segment2 segment3

  7. SQL OR MAP/REDUCE? • yes. • tradeoffs in programmability/usability • compatibility • cultural diversity • mix and match • see above

  8. FROM FLEETS TO SPACESHIPS • alas, parallel dataflow only works for data • wait ... nearly everything is data!

  9. RETHINKING PROGRAMMING • data-centric, declarative, distributed • NWing, robotics, machine learning, NLP, games, cloud infrastructure...

  10. RESEARCH: LINCOLN& BOOM • lincoln • a data-centric language for parallel computing • the cloud goes BOOM! • Berkeley Orders of Magnitude • OOM bigger systems,OOM less code • distributed filesystem, parallel dataflow infrastructure • built out of dataflow! • evolution: incremental ‘ilities with minimal effort

More Related