1 / 33

Dilated Neural Networks for Time Series Forecasting

Dilated Neural Networks for Time Series Forecasting. Chenhui Hu Data Scientist Microsoft Cloud & Enterprise. Agenda. Overview of Time Series Forecasting Methods Foundations of Dilated Convolutional Neural Networks Application to Retail Demand Forecasting. Agenda.

mitzel
Télécharger la présentation

Dilated Neural Networks for Time Series Forecasting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dilated Neural Networks for Time Series Forecasting Chenhui Hu Data Scientist Microsoft Cloud & Enterprise

  2. Agenda Overview of Time Series Forecasting Methods Foundations of Dilated Convolutional Neural Networks Application to Retail Demand Forecasting

  3. Agenda Overview of Time Series Forecasting Methods Foundations of Dilated Convolutional Neural Networks Application to Retail Demand Forecasting

  4. Overview Time Series Forecasting Applications web traffic forecasting financial forecasting retail sales forecasting

  5. Representative Methods Exponential Smoothing LSTM Boosted Decision Trees Seq2Seq LSTM ARIMA Support Vector Regression Bayesian Approach Random Forest Lasso Regression Dilated CNN GRU GARCH 1996 2016 1999 1950s early 1970s late 1970s 1980s 1995 1997 2014 Surge of DNN methods Statistical Methods Classic Machine Learning Deep Learning J. Gooijer and R. Hyndman. 25 Years of Time Series Forecasting.

  6. yt Recurrent NN RNN models the order of the data explicitly Long Short-Term Memory (LSTM) network models long-range dependencies … y0 y1 y3 yn y0 y1 y2 yt … … x0 x1 x2 xt multi-layer perceptron unrolled Recurrent Neural Network (RNN) http://colah.github.io/posts/2015-08-Understanding-LSTMs/

  7. Convolutional NN Success of CNN in CV and recently in NLP Outperforms human in many computer vision tasks Achieves state-of-the-art accuracy in audio synthesis & machine translation MLP with convolution layers convolved feature image 2D convolution as a linear combination of pixel values in a neighborhood

  8. RNN vs. CNN RNN Has been default DL choice for sequence modeling Can learn long-range dependencies effectively But training is usually slower compared with CNN CNN Lower level of model complexity Easy to parallelize the computation Dilated CNN can outperform RNN in sequence modeling

  9. Agenda Overview of Time Series Forecasting Methods Foundations of Dilated Convolutional Neural Networks Application to Retail Demand Forecasting

  10. Dilated CNN Overview CNN with dilated causal conv. layers Basic setup: dilated conv. layers + output layer Advanced: skip connections, dropout, batch normalization, gated activation units, … Examples WaveNet for audio synthesis Temporal CNN for sequence modeling • S. Bai, J. Kolter, and V. Koltun. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. .

  11. 1-D Convolution for Time Series kernel size = 3 Filter

  12. 1-D Convolution for Time Series kernel size = 3 Filter

  13. 1-D Convolution with Multiple Filters Filters

  14. Causal Convolution Only existing (no future) data is used in each neuron a stack of causal conv. layers Oord et al. WaveNet: A Generative Model for Raw Audio.

  15. Dilated Convolution With dilation width , the conv. window starts at location of size is stack of dilated causal convolutional layers Oord et al. WaveNet: A Generative Model for Raw Audio.

  16. Dilated-CNN Advantages Increases receptive field Receptive field is the part of the data visible to a neuron Exponentially increasing the dilation factor results in exponential receptive field growth Captures global view of the input with less parameters Handles temporal flow with causal structure K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition.

  17. Dilated-CNN Design … … ReLU activation: , where . Dropout for regularization Use WeightNorm or BatchNorm to speed up convergence Use skip connection to avoid degradation of model performance when adding more layers Concatenate building blocks to increase model capacity • S. Bai, J. Kolter, and V. Koltun. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. .

  18. Dilated-RNNs example of a three-layer Dilated-RNN with dilation 1, 2, and 4 Chang et al. Dilated Recurrent Neural Networks.

  19. Agenda Overview of Time Series Forecasting Methods Foundations of Dilated Convolutional Neural Networks Application to Retail Demand Forecasting

  20. Application to Retail Demand Forecasting OrangeJuice dataset in bayesm R package 83 stores 11 brands 121 weeks https://cran.r-project.org/web/packages/bayesm/bayesm.pdf

  21. Sample Data

  22. Feature Engineering move – historical unit sales deal, feat – if on sale or advertisement price – price of the current brand price_ratio – relative price to other brands month – month number week_of_month – week number of the month

  23. Data Transformation …… …… feature 1 feature 2 feature n feature 1 feature 2 feature n features created at time t (# of lags, # of features) lag 1 lag 1 lag 2 lag 2 lag 3 lag 3 features created at time t+1 (# of samples, # of lags, # of features) …… ……

  24. Network Structure Lambda Embedding Flatten Input Layer2 store id, brand id Lambda Embedding Flatten Dense Dense Concatenate Input Layer1 Conv 1D Conv 1D Conv 1D Conv 1D Dropout Flatten move, deal, feat, price, price ratio, month, week_of_ month Concatenate [#samples, 15, 3] [#samples, 15, 7]

  25. Model Definition Embedding: encode categorical features into binary sequences Conv1D n_filters kernel_size dilation_rate padding activation function Skip connections Combine with categorial features and pass to a dense layer

  26. Model Definition Embedding: encode categorical features into binary sequences Conv1D n_filters kernel_size dilation_rate padding activation function Skip connections Combine with categorial features and pass to a dense layer

  27. Model Definition Embedding: encode categorical features into binary sequences Conv1D n_filters kernel_size dilation_rate padding activation function Skip connections Combine with categorial features and pass to a dense layer

  28. Model Definition Embedding: encode categorical features into binary sequences Conv1D n_filters kernel_size dilation_rate padding activation function Skip connections Combine with categorial features and pass to a dense layer

  29. Hyperparameter Tuning HyperDrive via Azure Machine Learning SDK Automates the hyperparameter sweeps on an elastic cluster Supports Bayesian sampling and early termination https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters

  30. Results Forecast unit sales between week 138 and 160 with model retrained every 2 weeks Azure Ubuntu Linux VM with one-half Tesla K80 GPU, 56 GB memory

  31. Results Mean absolute percentage error (MAPE) Results are collected based on the median of 5 run results

  32. Thank you!

  33. Rate today’s session Session page on conference website O’Reilly Events App

More Related