1 / 17

Optimistic Concurrency Control for Distributed Learning

Optimistic Concurrency Control for Distributed Learning. Xinghao Pan Joseph E. Gonzalez Stefanie Jegelka Tamara Broderick Michael I. Jordan. Machine Learning Algorithm. Model Parameters. Data. Distributed Machine Learning. Model Parameters. Data. Distributed Machine Learning. !.

adeola
Télécharger la présentation

Optimistic Concurrency Control for Distributed Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimistic Concurrency Controlfor Distributed Learning Xinghao Pan Joseph E. Gonzalez Stefanie Jegelka Tamara Broderick Michael I. Jordan

  2. Machine Learning Algorithm Model Parameters Data

  3. Distributed Machine Learning Model Parameters Data

  4. Distributed Machine Learning ! ! Model Parameters Data Correctness: serial equivalence Concurrency: more machines = less time

  5. Coordination-free Model Parameters Data

  6. Mutual Exclusion Model Parameters Data

  7. Mutual Exclusion Model Parameters Data

  8. Mechanism for ensuring correctness Concurrency Coordination- free Optimistic Concurrency Control ? High Conflicts are rare Low Mutual exclusion Correctness Low High

  9. Optimistic Concurrency Control ! ! Model Parameters • Optimistic updates • Validation: detect conflict • Resolution: fix conflict Data Concurrency Correctness Hsiang-Tsung Kung and John T Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.

  10. Optimistic Concurrency Control Application: Clustering • Natural domain for parallelization • K-means – popular algorithm • Fixed number of clusters – not fit for Big Data • Big Data solution: DP-means + OCC

  11. Example

  12. Example: K-means Bad!

  13. Example: DP-means Correct clusters Sequential! Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of 23rd International Conference on Machine Learning, 2012.

  14. OCC DP-means Validation Resolution

  15. Evaluation: Amazon EC2 ~140 million data points; 1, 2, 4, 8 machines OCC DP-means Runtime Projected Linear Scaling 2x #machines ≈ ½x runtime

  16. Optimistic Concurrency Control • High concurrency: • Conflicts rare • Validation easy • Resolution cheap • OCCified Algorithms • Online facility location • BP-means: feature modeling • Ongoing • Stochastic gradient descent • Collapsed Gibbs sampling

  17. Optimistic Concurrency Control What can OCC do for you? See us @ poster session! xinghao@eecs.berkeley.edu Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, and Michael I. Jordan. Optimistic concurrency control for distributed unsupervised learning. ArXive-prints arXiv:1307.8049, 2013. Big Learning @ NIPS 2013 http://biglearn.org

More Related