320 likes | 324 Vues
Accelerating Distributed Machine Learning by Smart Parameter Server. Jinkun Geng , Dan Li and Shuai Wang. Background. Distributed machine learning becomes the common practice, because of: 1. The explosive growth of data size. Background.
E N D
AcceleratingDistributedMachineLearningbySmartParameterServer JinkunGeng, Dan Li and Shuai Wang
Background • Distributedmachinelearningbecomesthecommonpractice,becauseof: • 1.Theexplosivegrowthofdatasize
Background • Distributedmachinelearningbecomesthecommonpractice,becauseof: • 2.Theincreasingcomplexityoftrainingmodel ImageNetCompetition: <10(Hinton, 2012), 22 (Google, 2014), 152 (Microsoft, 2015), 1207 (SenseTime, 2016)
Background • ParameterServer(PS)-basedarchitectureiswidelysupportedbymainstreamDMLsystems.
Background • However,thepowerofPSarchitecturehasnotbeenfullyexploited. • 1.Communicationredundancy • 2.Stragglerproblem
Background • Adeeperinsight… • 1.Worker-centricdesignislessefficient • 2.PScanbemoreintelligent(i.e.SmartPS) SmartPS
Background • TomakePSmoreintelligent… • Dependency-Aware • Straggler-Assistant
DesignStrategies • TomakePSmoreintelligent… • 1.Selectiveupdate() • 2.Proactivepush() • 3.Prioritizedtransmission() • 4.Unnecessarypushblockage()
Evaluation • ExperimentSetting: • 17Nodeswithdifferentperformanceconfigurations:1PS+16Worker • 2Benchmarks: • MatrixFactorizationandPageRank • 5Baselines: • BSP, ASP,SSP(slack=1), SSP(slack=2),SSP (slack=3)
Evaluation MFBenchmark: Withacommonthreshold,SmartPSreducesthetrainingtimeby68.1%~90.3%comparedwiththebaselines.
Evaluation PRBenchmark: Withacommonthreshold,SmartPSreducesthetrainingtimeby65.7%~84.9%comparedwiththebaselines.
FurtherDiscussion • Comparisontosomerecentworks: Bothleveragetheknowledgeofparameterdependency 2.Bothleverageprioritizedtransmission forDMLacceleration
FurtherDiscussion • Comparisontosomerecentworks:
OngoingWork • AdeeperinsightintoPS-basedarch… • FunctionofPS: • 1.ParameterDistribution • 2.ParameterAggregation • FunctionofWorker: • 1.ParameterRefinement ->DataAccessControl ->DataOperation ->DataOperation
OngoingWork ParameterDistribution ParameterAggregation ParameterRefinement
OngoingWork DataAccessControl DataOperation DataOperation
OngoingWork DataAccessControl Token Token Token DataOperation
NextGenerationofSmartPS • ParameterServer->TokenServer • 1.Decoupledata(access)controlanddataoperation • 2.Alight-weightandsmartTokenServerinsteadofParameterServer. TokenServer ParameterServer
Thanks! NASPResearchGroup https://nasp.cs.tsinghua.edu.cn/ https://www.gengjinkun.com/