Optimizing Big Data Value Extraction with Mesos

CS 294-42: Project Suggestions September 14, 2011 Ion Stoica (http://www.cs.berkeley.edu/~istoica/classes/cs294/11/)

Projects • This is a project oriented class • Reading papers should be means to a great project not a goal in itself! • Strongly prefer groups of two • Perfectly fine to have the same project at cs262 • Today, I’ll present some suggestions • But, you are free to come up with your own proposal • Main goal: just do a great project

Where I’m Coming From? • Key challenge: maximize economic value of data, i.e., • Extract value from data while reducing costs (e.g., storage, computation)

Where I’m Coming From? • Tools to extract value from big-data • Scalability • Response time • Accuracy • Provide high cluster utilization for heterogeneous workloads • Support diverse SLAs • Predictable performance • Isolation • Consistency

Caveats • Cloud computing is HOT, but lot of NOISE! • Not easy to • differentiate between narrow engineering solutions and fundamental tradeoffs • predict the importance of the problem you solve • Cloud computing it’s akin Gold Rush!

Background: Mesos • Rapid innovation in cloud computing • No single framework optimal for all applications • Running each framework on its dedicated cluster • Expensive • Hard to share data Dryad Cassandra Hypertable Pregel Need to run multiple frameworks on same cluster

Background: Mesos – Where We Want to Go uniprogramming multiprogramming Today: static partitioning Mesos: dynamic sharing Hadoop Pregel Shared cluster MPI

Background: Mesos – Solution • Mesos is a common resource sharing layer over which diverse frameworks can run Hadoop MPI Mesos … Node Node Node Node

Background: Workload in Datacenters Priority Response High Low Interactive (low-latency) Batch

Datacenter OS: Resource Management, Scheduling

Hierarchical Scheduler (for Mesos) • Allow administrators to organize into groups • Provide resource guarantees per group • Share available resources (fairly) across groups • Research questions • Abstraction (when using multiple resources)? • How to implement using resource offers? • What policies are compatible at different levels in the hierarchy?

Cross Application Resource Management • An app uses many services (e.g., file systems, key-value storage, databases, etc) • If an app has high priority and the service it uses doesn’t, the app SLA (Service Level Agreement) might be violated • Research questions • Abstraction, e.g., resource delegation, priority propagation? • Clean-slate mechanisms vs. incremental deployability • This is also highly challenging in single node OSes!

Resource Management using VMs • Most cluster resource managers use Linux containers (e.g., Mesos) • Thus, schedulers assume no task migration • Research questions: • Develop scheduler for VM environments (e.g., extend DRF) • Tradeoffs between migration, delay, and preemption

Task Granularity Selection (Yanpei Chen) • Problem: number of tasks per stage in today’s MapRed apps (highly) sub-optimal • Research question: • Derive algorithms to pick the number of tasks to optimize various performance metrics, e.g., • utilization, response time, network traffic • subject to various constraints, e.g., • capacity, network

Resource Revocation • Which task we should revoke/preempt? • Two questions • Which slot has least impact on the giving framework? • Is the slot acceptable to receiving framework? • Research questions • Identify feasible slot for receiving framework with least impact on giving framework • Light-weight protocol design

Control Plane Consistency Model • What type of consistency is “good-enough” for various control plane functions • File system metadata (Hadoop) • Routing (Nicira) • Scheduling • Coordinated caching • … • Research question • What are trade-off between performance and consistency? • Develop generic framework for control plane

Decentralized vs. Centralized Scheduling • Decentralized schedulers • E.g., Mesos, Hadoop 2.0 • Delegate decision to apps (i.e., frameworks, jobs) • Advantages: scale and separation of concerns (i.e., apps know the best where and which tasks to run) • Centralized schedulers • Knows all app requirements • Advantages: optimal • Research challenge: • Evaluate centralized vs. decentralized schedulers • Characterize class of workloads for which decentralized scheduler is good enough

Opportunistic Scheduling • Goal: schedule interactive jobs (e.g., <100ms latency) • Existing schedulers: high overhead (e.g., Mesos needs to decide on every offer) • Research challenge: • Tradeoff between utilization and response time • Evaluate hybrid approach

Background: Dominant Resource Fairness • Implement fair (proportional) allocation for multiple types of resources • Key properties • Strategy proof: users cannot get an advantage by lying about their demands • Sharing incentives: users are incentivized to share a cluster rather than partitioning it

DRF for Non-linear Resources/Demands • DRF assume resources & demands are additive • E.g., task 1 needs (1CPU, 1GB) and task 2 needs (1CPU, 3GB)  both tasks need (2CPU, 4GB) • Sometime demands are non-linear • E.g., shared memory • Sometime resources are non-linear • E.g., disk throughput, caches • Research challenge: • DRF-like scheduler for non-linear resources & demands (could be two projects here!)

DRF for OSes • DRF designed for clusters using resource offer mechanism • Redesign DRF to support multi-core OSes • Research questions: • Is resource offer best abstraction? • How to best leverage preemption? (in Mesos tasks are not preempted by default) • How to support gang scheduling?

Storage & Data Processing

Resource Isolation for Storage Services • Share storage (e.g., key-value store) between • Frontend, e.g., web services • Backend, e.g., analytics on freshest data • Research challenge • Isolation mechanism: protect front-end performance from back-end workload

“Quicksilver” DB • Goal: interactive queries with bounded error on “unbounded” data • Trade between efficiency and accuracy • Query response time target: < 100ms • Approach: random pre-sampling across different dimensions (columns) • Research question: given a query and an error bound, find • Smallest sample to compute result • Sample minimizing disk (or memory) access times • (Talk with Sameer, if interested)

Split-Privacy DB (1/2) result fprivate fpublic Public DB Private DB • Partition data & computation • Private • Public (stored on cloud) • Goal: use cloud without revealing the computation result • Example: • Operation f(x, y) = x + y, where • x: private • y: public • Pick random number a, and compute x’ = x + a • compute f(x’, y) = r’ = x’ + y • recover result: r = r’ – a = (x’ – a) + y = x + y

Split-Privacy DB (2/2) result fprivate fpublic Public DB Private DB • Partition data & computation • Private • Public (stored on cloud) • Example: patient data (private), public clinical and genomics data sets • Goal: use cloud without revealing the computation result • Research questions: • What types of computation can be implemented? • Any more powerful than privacy-preserving computation / Data Mining?

RDDs as an OS Abstraction • Resilient Data Sets (RDDs) • Fault-tolerant (in-memory) parallel data structures • Allows Spark apps to efficiently reuse data • Design cross-application RDDs • Research questions • RDD reconstruction (track software and platform changes) • Enable users to share intermediate results of queries (identify when two apps compute same RDD) • RDD cluster-wide caching

Provenance-based Efficient Storage (Peter B and Patrick W) • Reduce storage by deleting data that can be recreated • Generalization of previous project • Research challenges: • Identify data that can deterministically recreated and the code to do so • Use hints? • Tradeoff between re-creation and storage • May take into account access patter, frequency, performance

Very-low Latency Streaming • Challenge: straglers, failures • Approaches to reduce latency: • Redundant computations • Speculative execution • Research questions • Theoretical trade-off between response time and accuracy? • Achieve target latency and accuracy, while minimizing the overhead

Optimizing Big Data Value Extraction with Mesos