Advancements in Distributed Learning: Challenges and Solutions for Neural Networks
This document outlines upcoming course numbers and lab assignments while addressing key concepts in deep learning, including probabilistic models, Hebbian learning, and distributed training systems. It highlights the challenges of working with large-scale neural networks, such as data distribution, processing synchronization, and parameter updates. Additionally, it discusses ongoing projects focused on asynchronous learning and optimal data allocation strategies, inviting volunteers for an open-source initiative. A rich resource for understanding advanced neural network training techniques.
Advancements in Distributed Learning: Challenges and Solutions for Neural Networks
E N D
Presentation Transcript
READINGS IN DEEP LEARNING 4 Sep 2013
ADMINSTRIVIA • New course numbers (11-785/786) are assigned • Should be up on the hub shortly • Lab assignment 1 up • Due date: 2 weeks from today • Google group: is everyone on? • Website issues.. • Wordpress not yet an option (CMU CS setup) • Piazza?
Poll for next 2 classes • Monday, Sep 9 • The perceptron: A probabilistic model for information storage and organization in the brain • Rosenblatt • Not really about the logistic perceptron, more about the probabilistic interpretation of learning in connectionist networks • Organization of behavior • Donald Hebb • About the Hebbian learning rule
Poll for next 2 classes • Wed, Sep 11 • Optimal unsupervised learning in a single-layer linear feedforward neural network. • Terence Sanger • Generalized Hebbian learning rule • The Widrow Hoff learning rule • Widrow and Hoff • Will be presented by PallaviBaljekar
Notices • Success of course depends on good presentations • Please send in your slides 1-2 days before the presentations • So that we can ensure they are OK • You are encouraged to discuss your papers with us/your classmates while preparing for them • Use the google group for discussion
A new project • Distributed large scale training of NNs.. • Looking for volunteers
The Problem: Distributed data • Training enormous networks • Billions of units • from large amounts of data • Billions or Trillions of instances • Data may be localized.. • Or distributed
The problem: Distributed computing • A single computer will not suffice • Need many processors • Tens or hundreds or thousands of computers • Of possibly varying types and capacity
Challenge • Getting the data to the computers • Tons of data to many computers • Bandwidth problems • Timing issues • Synchronizing the learning
Logistic Challenges • How to transfer vast amounts of data to processors • Which processor gets how much data.. • Not all processors equally fast • Not all data take equal amounts of time to process • .. and which data • Data locality
Learning Challenges • How to transfer parameters to processors • Networks are large, billions or trillions of parameters • Each processor must have the latest copy of parameters • How to receive updates from processors • Each processor learns on local data • Updates from all processors must be pooled
Learning Challenges • Synchronizing processor updates • Some processors slower than others • Inefficient to wait for slower ones • In order to update parameters at all processors • Requires asynchronous updates • Each processor updates when done • Problem: Different processors now have different set of parameters • Other processors may have updated parameters already • Requires algorithmic changes • How to update asynchronously • Which updates to trust
Current Solutions • Faster processors • GPUs • GPU programming required • Large simple clusters • Simple distributed programming • Large heterogeneous clusters • Techniques for asynchronouslearning
Current Solutions • Still assume data distribution nota major problem • Assume relatively fast connectivity • Gigabit ethernet • Fundamentally cluster-computingbased • Local area network
New project • Distributed learning • Wide area network • Computers distributed across the world
New project • Supervisor/Worker architecture • One or more supervisors • May be a hierarchy • A large number of workers • Supervisors in charge of resource and task allocation, gathering and redistributing updates, synchronization
New project • Challenges • Data allocation • Optimal policy for data distribution • Minimal latency • Maximum locality
New project • Challenges • Computation allocation • Optimal policy for learning • Compute load proportional to compute capacity • Reallocation of data/task asrequired
New project • Challenges • Parameter allocation • Do we have to distribute all parameters • Can learning be local
New project • Challenges • Trustable updates • Different processors/LANs have different speeds • How do we trust their updates • Do we incorporate or reject?
New project • Optimal resychronization: how much do we transmit • Should not have to retransmit everything • Entropy coding? • Bit-level optimization?
Possibilities • Massively parallel learning • Never ending learning • Multimodal learning • GAIA..
Asking for Volunteers • Will be an open source project • Write to Anders
Today • Bain’s theory: Lars Mahler • Linguist, mathematician, philosopher • One of the earliest people to propose connectionist architecture • Anticipated much of modern ideas • McCulloch and Pitts: KartikGoyal • Early model of neuron: Threshold gates • Earliest model to consider excitation and inhibition