MapReduce and Parallel DMBSs: Friends or Foes?

MapReduce and Parallel DMBSs: Friends or Foes? Michael Stonebraker, Daniel Abadi, David J. Dewitt, Sam Madden, Erik Paulson, Andrew Pavlo, Alexander Rasin Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010. Presentation and slides by Elisa Tvete, Jim Avery

Parallel DBMS architecture • Multiple nodes running database software • “Shared-nothing nodes” - separate CPU, memory, disks • Data horizontally partitioned across all nodes • Each node runs query on own data • Results returned to central processing node • Central node calculates final result

MapReduce architecture • Several computing nodes used • Data not pre-loaded • Query has “Map” and “Reduce” components • Key/value data is distributed to nodes • Nodes perform “Map” step • Results are returned to central processing node

Performance Trade-offs Demonstration • Three systems: • Hadoop MR Framework • Vertica, a column-store relational database • DBMS-X, a row-based database • Three tasks: • Original MR Grep task • SELECT * FROM Data WHERE field LIKE `%XYZ%'; • Web log task • SELECT sourceIP, SUM(adRevenue) FROM UserVisits GROUP BY sourceIP; • Join task

Demonstration Results

MR Complements Parallel DBMS • MR good at extract-transform-load queries • Extract raw data, process it, load into DBMS • Can perform complex analytics more easily • Queries not suitable for single SQL query • Can use data without strictly defined schema • MR functions can enhance parallel DBMS!

Conclusion • Architectural Differences • Repetitive record parsing • Compression • Pipelining • Scheduling • Discussion • Coexistence

Resources • M. Stonebraker, D. Abadi, D. J. DeWitt, S. Madden, E. Paulson, A. Pavlo, and A. Rasin, "MapReduce and Parallel DBMSs: Friends or Foes?," Communications of the ACM, vol. 53, iss. 1, pp. 64-71, 2010. • A. Pavlo, E. Paulson, A. Rasin, D. J. Abadi, D. J. DeWitt, S. Madden, and M. Stonebraker. A Comparison of Approaches to Large-Scale Data Analysis. Brown University Data Management Research Group, 26 Feb. 2013. Web. 24 Aug 2011. <http://database.cs.brown.edu/projects/mapreduce-vs-dbms/>

MapReduce and Parallel DMBSs: Friends or Foes?