170 likes | 370 Vues
Optimistic Concurrency Control for Distributed Learning. Xinghao Pan Joseph E. Gonzalez Stefanie Jegelka Tamara Broderick Michael I. Jordan. Machine Learning Algorithm. Model Parameters. Data. Distributed Machine Learning. Model Parameters. Data. Distributed Machine Learning. !.
E N D
Optimistic Concurrency Controlfor Distributed Learning Xinghao Pan Joseph E. Gonzalez Stefanie Jegelka Tamara Broderick Michael I. Jordan
Machine Learning Algorithm Model Parameters Data
Distributed Machine Learning Model Parameters Data
Distributed Machine Learning ! ! Model Parameters Data Correctness: serial equivalence Concurrency: more machines = less time
Coordination-free Model Parameters Data
Mutual Exclusion Model Parameters Data
Mutual Exclusion Model Parameters Data
Mechanism for ensuring correctness Concurrency Coordination- free Optimistic Concurrency Control ? High Conflicts are rare Low Mutual exclusion Correctness Low High
Optimistic Concurrency Control ! ! Model Parameters • Optimistic updates • Validation: detect conflict • Resolution: fix conflict Data Concurrency Correctness Hsiang-Tsung Kung and John T Robinson. On optimistic methods for concurrency control. ACM Transactions on Database Systems (TODS), 6(2):213–226, 1981.
Optimistic Concurrency Control Application: Clustering • Natural domain for parallelization • K-means – popular algorithm • Fixed number of clusters – not fit for Big Data • Big Data solution: DP-means + OCC
Example: K-means Bad!
Example: DP-means Correct clusters Sequential! Brian Kulis and Michael I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of 23rd International Conference on Machine Learning, 2012.
OCC DP-means Validation Resolution
Evaluation: Amazon EC2 ~140 million data points; 1, 2, 4, 8 machines OCC DP-means Runtime Projected Linear Scaling 2x #machines ≈ ½x runtime
Optimistic Concurrency Control • High concurrency: • Conflicts rare • Validation easy • Resolution cheap • OCCified Algorithms • Online facility location • BP-means: feature modeling • Ongoing • Stochastic gradient descent • Collapsed Gibbs sampling
Optimistic Concurrency Control What can OCC do for you? See us @ poster session! xinghao@eecs.berkeley.edu Xinghao Pan, Joseph E. Gonzalez, Stefanie Jegelka, Tamara Broderick, and Michael I. Jordan. Optimistic concurrency control for distributed unsupervised learning. ArXive-prints arXiv:1307.8049, 2013. Big Learning @ NIPS 2013 http://biglearn.org