330 likes | 467 Vues
This research explores advanced techniques for training deep autoencoders, focusing on alternate layer sparsity and intermediate fine-tuning. By leveraging these strategies, we tackle challenges in deep learning, such as non-convex optimization and the vanishing gradient problem. Our findings demonstrate that combining multiple training approaches leads to improved performance in dimensionality reduction tasks. We employ various autoencoder models, including contractive and sparse autoencoders, utilizing datasets like MNIST to validate our methods. Preliminary results suggest significant efficiency and accuracy improvements in feature extraction.
E N D
Submitted by: Supervised by: AnkitBhutani Prof. AmitabhaMukerjee (Y9227094) Prof. K S Venkatesh ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS
AUTOENCODERS • AUTO-ASSOCIATIVE NEURAL NETWORKS • OUTPUT SIMILAR AS INPUT
DIMENSIONALITY REDUCTION • BOTTLENECK CONSTRAINT • LINEAR ACTIVATION – PCA [Baldi et al., 1989] • NON-LINEAR PCA [Kramer, 1991] – 5 layered network • ALTERNATE SIGMOID AND LINEAR ACTIVATION • EXTRACTS NON-LINEAR FACTORS
ADVANTAGES OF NETWORKS WITH MULTIPLE LAYERS • ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS • TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA • HEIRARCHICAL REPRESENTATION • RESULTS FROM CIRCUIT THEORY – SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS
PROBLEMS WITH DEEP NETWORKS • DIFFICULTY IN TRAINING DEEP NETWORKS • NON-CONVEX NATURE OF OPTIMIZATION • GETS STUCK IN LOCAL MINIMA • VANISHING OF GRADIENTS DURING BACKPROPAGATION • SOLUTION • -``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION’’ – [Hinton et. al., 2006] • GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING
HOW TO TRAIN DEEP NETWORKS? • PRE-TRAINING • INCREMENTAL LAYER-WISE TRAINING • EACH LAYER ONLY TRIES TO REPRODUCE THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER
FINE-TUNING • INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING • PERFORM BACKPROPOAGATION AS USUAL
MODELS USED FOR PRE-TRAINING • STOCHASTIC – RESTRICTED BOLTZMANN MACHINES (RBMs) • HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A PROBABILISTIC DECISION OF PUTTING 0 OR 1 • MODEL LEARNS THE JOINT PROBABILITY OF 2 BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER • EXACT METHODS – COMPUTATIONALLY INTRACTABLE • NUMERICAL APPROXIMATION - CONTRASTIVE DIVERGENCE
MODELS USED FOR PRE-TRAINING • DETERMINISTIC – SHALLOW AUTOENCODERS • HIDDEN LAYER ACTIVATIONS (0-1) ARE DIRECTLY USED FOR INPUT TO NEXT LAYER • TRAINED BY BACKPROPAGATION • DENOISING AUTOENCODERS • CONTRACTIVE AUTOENCODERS • SPARSE AUTOENCODERS
DATASETS • MNIST • Big and Small Digits
DATASETS • Square & Room • 2d Robot Arm • 3d Robot Arm
Libraries used • Numpy, Scipy • Theano – takes care of parallelization • GPU Specifications • Memory – 256 MB • Frequency – 33 MHz • Number of Cores – 240 • Tesla C1060
MEASURE FOR PERFORMANCE • REVERSE CROSS-ENTROPY • X – Original input • Z – Output • Θ– Parameters – Weights and Biases
BRIDGING THE GAP • RESULTS FROM PRELIMINARY EXPERIMENTS
PRELIMINARY EXPERIMENTS • TIME TAKEN FOR TRAINING • CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN
SPARSITY FOR DIMENSIONALITY REDUCTION • EXPERIMENT USING SPARSE REPRESENTATIONS • STRATEGY A – BOTTLENECK • STRATEGY B – SPARSITY + BOTTLENECK • STRATEGY C – NO CONSTRAINT + BOTTLENECK
OTHER IMPROVEMENTS • MOMENTUM • INCORPORATING THE PREVIOUS UPDATE • CANCELS OUT COMPONENTS IN OPPOSITE DIRECTIONS – PREVENTS OSCILLATION • ADDS UP COMPONENTS IN SAME DIRECTION – SPEEDS UP TRAINING • WEIGHT DECAY • REGULARIZATION • PREVENTS OVER-FITTING
COMBINING ALL • USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS
INTERMEDIATE FINE-TUNEING • MOTIVATION