1 / 46

Intrinsic dimension of data representations in deep neural networks

Intrinsic dimension of data representations in deep neural networks. Alessandro Laio. Intrinsic dimension of data representations in deep neural networks. Jakob Macke (TUM). Alessio Ansuini (SISSA). neuroscience. physics. machine learning. Davide Zoccolan. Alessandro Laio (SISSA).

brousseau
Télécharger la présentation

Intrinsic dimension of data representations in deep neural networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Intrinsic dimension of data representations in deep neural networks Alessandro Laio

  2. Intrinsic dimension of data representations in deep neural networks Jakob Macke (TUM) AlessioAnsuini (SISSA) neuroscience physics machine learning Davide Zoccolan Alessandro Laio(SISSA)

  3. Deep neural nets for image classification Training with a very large set of labelled images (> 1 million) INPUT Adapted from Yamins & DiCarlo et al (2016)

  4. Deep neural nets for image classification Adapted from LeCun et al (2015) Testing images OUTPUT INPUT Adapted from Yamins & DiCarlo et al (2016)

  5. Deep neural nets for image classification Adapted from LeCun et al (2015) Testing images

  6. Machine vs. human vision Artificial vision Human vision What do these processes have in common?

  7. The ventral stream: an object-processing path Adapted from Yamins & DiCarlo et al(2016) Deep neural nets for image classification

  8. The ventral stream: an object-processing path Adapted from Yamins & DiCarlo et al (2016) retina thalamus primaryand secondary visual cortex (V1& V2) area V4 inferotemporal cortex (IT)

  9. The ventral stream: an object-processing path Adapted from Yamins & DiCarlo et al (2016) Retinal space Sam manifold Joe manifold Adapted from DiCarlo & Cox (2007)

  10. The ventral stream: an object-processing path Adapted from Yamins & DiCarlo et al (2016) V1 space Retinal space Sam manifold Sam manifold Joe manifold Joe manifold Adapted from DiCarlo & Cox (2007)

  11. The ventral stream: an object-processing path Adapted from Yamins & DiCarlo et al (2016) IT space V1 space Retinal space Sam manifold Sam manifold Sam manifold Joe manifold Joe manifold Joe manifold Adapted from DiCarlo & Cox (2007)

  12. The untangling hypothesis untangling of object manifolds Adapted from Yamins & DiCarlo et al (2016) IT space Retinal space Sam manifold Sam manifold Joe manifold Joe manifold Adapted from DiCarlo & Cox (2007)

  13. The untangling hypothesis untangling of object manifolds IT space Retinal space Sam manifold Sam manifold Joe manifold Joe manifold Adapted from DiCarlo & Cox (2007)

  14. The untangling hypothesis untangling of object manifolds Last hidden layer space Pixel space Sam manifold • Object manifolds: • Less tangled • Flattened • Lower dimensionality Sam manifold Joe manifold ✓ Joe manifold Adapted from DiCarlo & Cox (2007)

  15. The untangling hypothesis untangling of object manifolds Last hidden layer space Pixel space Sam manifold • Object manifolds: • Less tangled • Flattened • Lower dimensionality Sam manifold Joe manifold ✓ ? Joe manifold Adapted from DiCarlo & Cox (2007)

  16. The untangling hypothesis untangling of object manifolds Last hidden layer space Pixel space Sam manifold • Object manifolds: • Less tangled • Flattened • Lower dimensionality Sam manifold Joe manifold ✓ ? Joe manifold ? Adapted from DiCarlo & Cox (2007)

  17. The goal: to track the intrinsic dimension (ID) of image representations in deep nets • We tested various state-of-the-art deep nets • AlexNet, VGG, VGG-bn & ResNet • We estimated the ID in a subset of layers • Input & output layers, pooling layers & fully connected layers VGG-16 Simonyan & Zisserman (2015) Source: https://blog.heuritech.com

  18. The goal: to track the intrinsic dimension (ID) of image representations in deep nets • We tested various state-of-the-art deep nets • AlexNet, VGG, VGG-bn & ResNet • We estimated the ID in a subset of layers • Input & output layers, • Empirical questions • How does the ID vary across the layers of a deep net? • How linear (i.e., flat) are the data manifolds? • Is there any relationship between ID in the last hidden layer and classification performance of the network? pooling layers & fully connected layers

  19. Estimation of the intrinsic dimension Intrinsic dimension of a data representation: minimal number of coordinates that are necessary to described its point without significant information loss • The linear case: Principal Component Analysis (PCA) 2D embedding space P1 P2 Activation x2 Activation x1

  20. Estimation of the intrinsic dimension Intrinsic dimension of a data representation: minimal number of coordinates that are necessary to described its point without significant information loss • The linear case: Principal Component Analysis (PCA) 2D embedding space P1 P2 Activation x2 1D linear subspace Activation x1

  21. Estimation of the intrinsic dimension • The general (non-linear) case: TwoNN(Facco et al, 2017) 1) For each data point icompute the distance to its first and second neighbour (ri,1 and ri,2) 2D embedding space 2) For each i compute Activation x2 The probability distribution of m is where d is the ID, independently on the local density of points. 3) Infer d from the empirical probability distribution of all the mi. pointi ri,2 ri,1 1D manifold Activation x1

  22. Results Evolution of ID across the layers of a deep net 1 pre-trained on ImageNet VGG-16 Simonyan & Zisserman (2015) Source: https://blog.heuritech.com

  23. Results Evolution of ID across the layers of a deep net 1 • ID evolution across layers has a hunchback shape • In each layer: ID << ED (embedding dimension) • In the last hidden layers: • the ID decreases monotonically • the ID reaches very small values (~10 or lower)

  24. Results Evolution of ID in state-of-the-art deep nets 2 • Four families of state-of-the art deep net architectures: • AlexNet • VGG • VGG-bn • ResNet • Pre-trained on ImageNet • ID computed for the 7 most populated categories (500 images each)

  25. Results Evolution of ID in state-of-the-art deep nets 2 • ID evolution across layers has a hunchback shape • In each layer: ID << ED (embedding dimension) • Considerable overlap of the ID profile as a function of relative layer depth: •  After an initial growth of the ID, deep nets perform a progressive dimensionality reduction of the object manifolds • Any relationship with classification accuracy?

  26. Results Performance vs. ID in last hidden layer 3 • Four families of state-of-the art deep net architectures: • AlexNet • VGG • VGG-bn • ResNet • Pre-trained on ImageNet • ID computed over a a random mixture of 2000 training images • Classification accuracy = top 5-score error

  27. Results Performance vs. ID in last hidden layer 3 r = 0.94 training set  r = 0.99  test set • ID on the training set is a strong predictor of performance on the test set • ID = proxy for generalization ability of a deep network

  28. The untangling hypothesis untangling of object manifolds Last hidden layer space Pixel space Sam manifold • Object manifolds: • Less tangled • Flattened • Lower dimensionality Sam manifold Joe manifold ✓ ? Joe manifold ? Adapted from DiCarlo & Cox (2007)

  29. The untangling hypothesis untangling of object manifolds Last hidden layer space Pixel space Sam manifold • Object manifolds: • Less tangled • Flattened • Lower dimensionality Sam manifold Joe manifold ✓ ? ✓ Joe manifold Adapted from DiCarlo & Cox (2007)

  30. The untangling hypothesis untangling of object manifolds Last hidden layer space Pixel space Sam manifold • Object manifolds: • Less tangled • Flattened • Lower dimensionality Sam manifold Joe manifold ✓ ? ✓ Joe manifold Adapted from DiCarlo & Cox (2007)

  31. Results Linear vs. non-linear ID estimates 4 ID-PCA ≈ ID-TwoNN ID-PCA >> ID-TwoNN 2D embedding space 2D embedding space Activation x2 Activation x2 1D manifold 1D manifold = 1D linear subspace Activation x1 Activation x1

  32. Results Linear vs. non-linear ID estimates 4 • No gap in the eigenvalue spectrum yielded by PCA  data manifolds are not linear • Yet we define ID-PCA = # of PCs that account for 90% of variance in the data

  33. Results Linear vs. non-linear ID estimates 4 • No gap in the eigenvalue spectrum yielded by PCA • ID-PCA >> ID-TwoNN

  34. Results Linear vs. non-linear ID estimates 4 • No gap in the eigenvalue spectrum yielded by PCA • ID-PCA >> ID-TwoNN • ID-PCA not much different for trained and randomly initialized networks

  35. Results Linear vs. non-linear ID estimates 4 • No gap in the eigenvalue spectrum yielded by PCA • ID-PCA >> ID-TwoNN • ID-PCA not much different for trained and randomly initialized networks • ID-TwoNN is flat for randomly initialized networks (orthogonal transformations) data manifolds are not linear

  36. The untangling hypothesis untangling of object manifolds Last hidden layer space Pixel space Sam manifold • Object manifolds: • Less tangled • Flattened • Lower dimensionality Sam manifold Joe manifold ✓ ✗ ✓ Joe manifold Adapted from DiCarlo & Cox (2007)

  37. Results Looking into the origin of ID initial expansion 5 VGG-16 Source: https://blog.heuritech.com

  38. Results Looking into the origin of ID initial expansion 5 MNIST

  39. Results Looking into the origin of ID initial expansion 5 MNIST MNIST

  40. Results Looking into the origin of ID initial expansion 5

  41. Results Looking into the origin of ID initial expansion 5

  42. Results Looking into the origin of ID initial expansion 5

  43. Results Looking into the origin of ID initial expansion 5 • In a trained network, the initial ID expansion reflects the pruning of low-level, highly correlated visual features that carry no information about the correct labeling

  44. Summary Hunchback shape of ID vs. layer depth Correlation of ID with performance Low ID but nonlinear manifolds ID expansion = pruning low-level info

  45. Summary FIRST LAYERS: remove irrelevant features. The ID grows LAST LAYERS: Dimensional reduction. The ID shrinks along the layers. The more gradual the better (deep networks win). …. ….

  46. Acknowledgments Jakob Macke (TUM) 2019 Alessio Ansuini (SISSA) Vancouver DavideZoccolan(SISSA)

More Related