Data Science 101 A Love Story
Agenda • Introductionto Data Science • Who’s who in Data Science? • That Data Science Life. • [Case Study] How Spotify manages their data. • [VM] The Data Science life at VaynerMedia. • Conclusions.
“If you can measure it, you can hack it.” E -> A -> E
We’re generating (and tracking) exponentially more data online than ever before.
But what is a Data Scientist? Who are they, andhow do they work with “Big Data”?
Angel has 2 mutual friends with Vikash.Tim has 20 mutual friends withVikash.If John is friends with Vikash, he might know Tim and his mutual friends.
This increased platform usage, making the experience on LinkedIn more valuable.
Big Data.Real Business objective.Simple Analysis.Valuable Data-driven Product.
Google started downloading the entireinternet in the late 90s-early 00s.
Google created a better way to process Big Data. They created MapReduce.
Hadoop is an open sourced distributed file system technology built using MapReduce.
Querying this data also allows us to work on our data retrieval skills.
Less time cleaning data.Less time “fishing”.Less spreadsheets. BOOM.
AWS EMR (Hadoop) Spotify Client AdHocMapReduce Jobs Hive (data warehouse infrastructure; SQL-like syntax) PostgreSQL