200 likes | 213 Vues
Compare the performance of Edge TPU with CPU and GPU for neutrino flavour identification using ResNet-50 v2 and DUNE FD simulation. Discuss results, future work, and Q&A.
E N D
BENCHMARKING Edge TPU FOR NEUTRINO FLAVOUR IDENTIFICATION SEPTEMBER 16TH, 2019 Stefano Vergani HEP COMPUTATIONAL R&D - FERMILAB
OVERVIEW • Professional details and introduction • CPU vs GPU vs TPU vs Edge TPU • ResNet-50 v2 • Mnist and DUNE FD simulation: performances on CPU, GPU, and Edge TPU • Results and comments • Future work • Q & A STEFANO VERGANI – HEP COMPUTATIONAL R&D
PROFESSIONAL DETAILS • PhD student in High Energy Physics at the University of Cambridge, UK • Works in the Pandora team -> Software development for Pandora, data analysis pion – argon cross section ProtoDUNE-SP • Supervisors: Leigh Whitehead, Melissa Uchida • BSc in Physics (Milan, Italy), MSc in Physics (ETH Zurich, Switzerland) • Currently working at FNAL under Mike Wang, HEP Computational R&D • email: sv408@hep.phy.cam.ac.uk STEFANO VERGANI – HEP COMPUTATIONAL R&D
GOALS OF THE PROJECT • Find one (or more) convolutional neural network (CNN) to perform Image Recognition on DUNE Far Detector (FD) simulation for flavour identification that can run on Edge TPU • Benchmark Coral Edge TPU with respect to CPU, GPU, TPU -> speed, accuracy, cost STEFANO VERGANI – HEP COMPUTATIONAL R&D
CPU vs GPU vs TPU Central Processing Unit Graphics Processing Unit Tensor Processing Unit STEFANO VERGANI – HEP COMPUTATIONAL R&D
CPU vs GPU vs TPU Central Processing Unit Graphics Processing Unit • Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz • TDP 65 w • price: ~ 200 USD • Google Colab • NVIDIA Tesla K80 • TDP 300 w • price ~ 5000 USD Thermal Design Power (TDP) represents the average power, in watts, the processor dissipates when operating at Base Frequency with all cores active under an Intel-defined, high-complexity workload. STEFANO VERGANI – HEP COMPUTATIONAL R&D
CORAL EDGE TPU 74.99 USD 65 mm x 30 mm 4 trillion operations (tera-operations) per second (TOPS), using 0.5 watts for each TOPS (2 TOPS per watt) -> max consumption 2 w Why Edge? Edge computing is the practice of processing data near the edge of your network, where the data is being generated, instead of in a centralized data-processing warehouse. STEFANO VERGANI – HEP COMPUTATIONAL R&D
CREATING THE RIGHT MODEL if not quantization aware training, use full integer post-training quantization https://coral.withgoogle.com/docs/edgetpu/models-intro/ STEFANO VERGANI – HEP COMPUTATIONAL R&D
ResNet-50 v2 • Residual Networks (ResNet) are used due to their simplicity and scalability with image size and network depth. • CMS experiment uses it. • ResNet-50 (50 layers) preferred to 152 for simplicity. Evidence shows that the best ImageNet models using convolutional and fully-connected layers typically contain between 16 and 30 layers. the vanishing/exploding gradient problem STEFANO VERGANI – HEP COMPUTATIONAL R&D
Residual Networks Deep Residual Learning for Image Recognition from He et al. [2] (Microsoft Research): https://arxiv.org/pdf/1512.03385.pdf STEFANO VERGANI – HEP COMPUTATIONAL R&D
PERFORMANCE BENCHMARKS ResNet-50 v2 trained using the ImageNet dataset with 1,000 classes https://coral.withgoogle.com/docs/edgetpu/benchmarks/ Image sizes 299x299 Desktop CPU: 557 ms/inference, Edge TPU 50 ms/inference Desktop CPU: 64-bit Intel(R) Xeon(R) E5-1650 v4 @ 3.60GHz STEFANO VERGANI – HEP COMPUTATIONAL R&D
MNIST DATABASE 60k training images – 10k test images handwritten digits commonly used as first train/test dataset, well known STEFANO VERGANI – HEP COMPUTATIONAL R&D
DUNE FD SIMULATION • three sets: , , • made inside LArSoft • three images from the three wire planes • 500x500x3 • size of the single image from 5 to 30 KB • in total ~3* events, pictures numu event 555270 STEFANO VERGANI – HEP COMPUTATIONAL R&D
DUNE FD SIMULATION STEFANO VERGANI – HEP COMPUTATIONAL R&D
PERFORMANCES PRELIMINARY MNIST DATABASE 60k training 10k test (picture size initial 28x28 then 56x56) DUNE SIMULATION NUTAU SUBSET 200 training 50 test (picture size 500x500) STEFANO VERGANI – HEP COMPUTATIONAL R&D
PERFORMANCE BENCHMARKS COST / INFERENCE = time/inference * TDP * USD/second = K * USD/second K factor w*s/inference Caveat: due to memory issues, with GPU I could use only 2 batch size. Increase in speed are typically seen from batch size 64. Therefore, features of Nvidia Tesla k80 are underestimated in this study. STEFANO VERGANI – HEP COMPUTATIONAL R&D
RESULTS • In terms of pure performance, GPU appears to be by far the fastest piece of hardware. • Edge TPU performs better with bigger images: with 56x56 pixels it is 2.5 times slower than CPU and 23 times than GPU. Instead, with images 500x500x3 it is 4 times faster than CPU and 5 times slower than GPU. • Edge TPU showed the smallest cost per inference and CPU showed the biggest cost per inference. STEFANO VERGANI – HEP COMPUTATIONAL R&D
FUTURE WORK • Run tests with more training / test samples from DUNE FD • Process separately the three wire planes instead having one image depth 3 • Test Google Cloud with GPU and TPU -> costs and performance • Add FPGA in the comparison. • Add another network in the comparison. STEFANO VERGANI – HEP COMPUTATIONAL R&D
SINCE I MENTIONED PANDORA…. STEFANO VERGANI – HEP COMPUTATIONAL R&D
Q & A Thank you for your attention! Q & A STEFANO VERGANI – HEP COMPUTATIONAL R&D