Big Data and the Database Community

Big Data and the Database Community Daniel Abadi Yale University

The Big Data phenomenon is the best thing that could have happened to the database community • Despite other definitions related to ‘3 Vs’ --- Big Data means BIG Data • Which means we need scalable database systems • Still two main components of Big Data • Performing data analysis at scale • Performing requests on data at scale Big Data

Database community has won the battle • Some thought that MapReduce might replace traditional database technology as the primary means to perform analysis at scale • Just about every MapReduce vendor has abandoned this goal • Hadapt, Impala, Tez, and several others are in a race to see who can add the most traditional database execution technology to Hadoop fastest • Everyone is going in the direction of cost-based optimizers, traditional database operators, and push-based query execution Performing Data Analysis at Scale

The database community is losing the battle • NoSQL systems still have very little traditional database technology inside (despite adding SQL interfaces) • No race to add DB technology --- why? • Don’t blame CAP --- CAP is only relevant when there’s a network partition • We never figured out how to do ACID and active replication at scale • Many new proposals make simplifying assumptions in order to handle scale • It’s been 30 years ---- why can’t we build a distributed database that can handle distributed transactions over actively replicated data at scale? Performing Requests on Data at Scale

Big Data and the Database Community

Big Data and the Database Community

Presentation Transcript

The Big Deal about Big Data

Big Data and Data Mining

Big Data and NoSQL

Big Data, Big Knowledge, and Big Crowd

Oracle NoSQL Database and Big Data

Big Data and the Cloud

Big Data and the Healthcare Revolution

The Big Deal About Big Data

Big Data and Analytics

Database: Big Picture

brief introduction to relational database and big data analysis

Big Data – Distributed Database (HBase)

Big Questions, Big Data and Big Answers

Big data and the third sector

THE BIG DATA ECOSYSTEM AND YOU !

Big Sky Community and Infrastructure

SAS and Big Data- The Big New Possibility

Big Data and Hadoop

Big Data Big Data

Can Parallel Database Systems Help Big Data Analytics?

brief introduction to relational database and big data analysis

NOSQL Database Engines for Big Data Management