1 / 19

Deep Learning with Consumer Transaction Data

Deep Learning with Consumer Transaction Data. Eric Greene Principal Consultant. A little about me…. Currently in transition from Wells Fargo AI Labs. Architect & Engineer focused on integrating machine learning and real time analytics into modern applications.

rosellaj
Télécharger la présentation

Deep Learning with Consumer Transaction Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Deep Learning with Consumer Transaction Data • Eric Greene • Principal Consultant

  2. A little about me… Currently in transition from Wells Fargo AI Labs Architect & Engineer focused on integrating machine learning and real time analytics into modern applications Four years working with Hadoop and Spark Soon starting a new role with Think Big Analytics in Melbourne, Australia Last year working with Keras and Tensor flow Legal Disclaimer The views and opinions expressed in this article are those of the authors and do not necessarily reflect the official policy or position of any other entity. Examples of analysis performed within this presentation are only examples. They should not be utilized in real-world analytic products as they are based only on very limited and dated open source information. Assumptions made within the analysis are not reflective of the position of any entity.

  3. Deep learning is awesome! What about my use case? Most deep learning literature on time series is about text, image or sound data.

  4. What’s different about consumer transaction data? • Consumer transaction data includes bank account, credit card and merchant billing histories • Features include semi structured description, date, amount, balance and payment account. • Lots of data across all customers, but not allot for a single customer • Multivariate timeseries with irregular intervals • Data is often not organized by customer. Lots of sorting required

  5. What are the use cases? • A short list of useful insights: • Predict future transactions • Predict future spending or income within specific categories • Categorize transactions • Detect anomalies for fraud, overspending • Segmentation. Identify groups of transactions during shopping, travel, life events.

  6. Traditional ML requires a model per insight For each type of prediction, a new model is designed and trained on aggregations of data and statistical frequencies of the desired events. When is the next time the customer will buy a burrito? How much will they spend on utilities next month? Will the customers account balance go below zero next week?

  7. That’s allot of work Every insight requires separate data science and engineering efforts. Data Scientist Models

  8. Apply deep learning to answer many questions simultaneously Given a history of disparate transaction types, predict the future transactions for an individual customer. • Correlation between different transactions types is not lost • Reduce modeling design to a single problem • Raw data, with minor feature engineering • Can be used as source for other down stream predictions

  9. Structure the data … 90% of your time Transaction data usually comes as large sets of date/time ordered (but not customer ordered) transactions. Group the data by customer to support training customer models. Customers 1…1000 • HDFS, Parquet and Spark to store, organize and process files into datasets • Partition the data into groups that correlate to a training batch size • Use a generator to feed your favorite DL framework directly • Start small and ramp up the dataset size when everything is working ... Customers 999,001…1,000,000 2016.01.01 … 2016.12.31 Parquet Files

  10. Use existing model architectures for word prediction Transactions aren’t Shakespeare, but lets treat them like it! • Word prediction models using LSTM and 1D convolutions have been successful at predicting the next words in a sentence. • Transactions are more repetitive and structured, so should work even better. • Companies have millions of rows of transactions for their customers. Plenty of data!

  11. Encode the data … Almost ready for training! Adapt transaction data to work with a deep learning text prediction model. 06/01/16 $502 CRDT CHECKING XXXXX887 CHECK BAL $201 06/30/16 $10 DBIT CRDT CRD XXXXX65 BURRTO BOB BAL $165 …. • A list of time ordered transactions can be used to create a list of financial sentences that describe a customers financial story. • Categorize date fields, amount ranges, accounts and products • Create a dictionary from the entire corpus of description words and categories • Encode transactions into word ids CATEGORIZE On Friday, the 1st of June, a deposit of around 500 dollars was made to checking account, bringing the balance to 200 dollars. On Tuesday the 30th of June, a 10 dollar purchase was made, bringing the balance to 165 dollars. On …. DICTIONARY (entire corpus) $100-200 : 500, CREDIT : 545, Tuesday : 45, CHECKING ACCOUNT : 67, BURRITO : 9933, END SENTENCE : 0 ENCODE 29993 27772 7773 7662 222 664 8883 9999 0 888 999 345 344 533 555 666 777 888 999 4444 333 222 4444 666 888 990 888 0 …

  12. Train the models 3 Months later … the train has arrived • Create word embeddings. • Modify well known deep CNN architecture to work with 1D text data. • Use an RNN variant such as LSTM, GRU. • Or use the next best deep learning text model you discovered at the OReilly AI Conference…. Data Engineer STRUCTURED DATA

  13. CNN Design - stateless Flatten and some fully connected Output list of one hot encoded words Several blocks of convolutions Input list of word ids Creates list of word vectors • ONE • HOT Data Engineer • FULLY • CON • EMBEDD • ING • WORD • IDS List of predicted transactions List of historical transaction • CONV • BL STRUCTURED DAA

  14. RNN Design – statefull Flatten and some fully connected Output list of one hot encoded words Several blocks of RNN Input list of word ids Creates list of word vectors • ONE • HOT Data Engineer • FULLY • CON Transactions words fed 1 by 1 • EMBEDD • ING Next transaction word • WORD • IDS • LSTM STRUCTURED DAA Feed previous prediction to get next

  15. Problems in training • LSTM converges slowly. More flexible with variable number of historical and predicted transactions. • CNN requires fixed amount of historical and predicted transactions, which is less flexible. Converges slowly when number of input and output transactions gets large. • Data prep for large datasets is time consuming. Start small. • Aggregating smaller random transactions into simple categories can gives better results. (may not work for all use cases)

  16. What would one expect to work? Everything. Eventually. In most cases. Should work. • Word embeddings will place correlated words close to each other in vector space • Transactions that have a well defined frequency will be easy to predict • Transactions without a well defined frequency produce scrambled descriptionsin the future • More data …more time … better convergence. • Amounts are close to actuals. BURRITO Friday $0--$10 Payment ~ $2000 Credit Coffee Monday

  17. But also expect the strange... • Allot of randomness in historical daily transactions so the model will show randomness in the predicted transactions. • Equal importance given by cost function to getting every word correct. • More data … better convergence. • Amounts are close to actuals within categories. Some descriptions will be jumbled.

  18. What’s Next? • Use the word embeddings for other models. • Reuse top layers of model for transfer learning in other problem areas. May work well across data sets. • Better models with more data. • Apply different model architectures as they become better understood. • Deploy. Does your model scale to millions of customers?

  19. Questions?

More Related