1 / 32

Getting State of the Art Results with less than ideal Data or Timescales

StoryStream is a leading automotive content platform trusted by top car brands. It helps brands provide a more engaging customer experience, reduce content creation costs, and understand customers better. The platform can boost customer engagement and conversions by up to 25% and reduce content costs by up to 60%. Dr. Janet Bastiman shares her experience using StoryStream to achieve remarkable results with limited data and tight timelines.

dor
Télécharger la présentation

Getting State of the Art Results with less than ideal Data or Timescales

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Getting State of the Art Results with less than ideal Data or Timescales Dr Janet Bastiman @yssybyl

  2. About StoryStream The world’s leading automotive content platform StoryStream is a dedicated automotive content platform, trusted by some of the world’s leading car brands. Specifically created to help automotive brands provide a more relevant, engaging customer experience, fuelled with authentic content and designed for efficiently scaling content operations across global teams. • Grow customer engagement and conversions by up to 25% • Reduce content creation and management costs by up to 60% • Provide a more authentic customer experience • Understand your customer in a deeper way The Core StoryStream Benefits

  3. “NINES don’t matter if USERS aren’t HAPPY” Charity Majors @mipsytipsy Dr Janet Bastiman @yssybyl

  4. “[Client] needs this to go live at the end of the month, I promised them we could deliver...” Every salesperson ever Dr Janet Bastiman @yssybyl

  5. Project timings • 35 models = 1050 days (one person linear) • ~ 5 years for one person working Mon-Fri - who is allowed holidays :) • 250 days with parallelisation of tasks and data upfront • 150 days on worksheet, balanced by an increase in ongoing license Dr Janet Bastiman @yssybyl

  6. Can you guess what happened next? Dr Janet Bastiman @yssybyl

  7. What would it take to get it done in that time? The Core (2003) Paramount Pictures Dr Janet Bastiman @yssybyl

  8. “They don’t have any data to give us” Dr Janet Bastiman @yssybyl

  9. Project scope • Visual classification of images to determine the car detail down to variant level • Must be able to distinguish between 250 different vehicles in natural environments, working equally well on professional and social media images • Differences were visually subtle and not always visible from the angle • Must be able to replace humans for accuracy and process at scale • Demo deadline in 3 months • Ready deadline in 6 months Dr Janet Bastiman @yssybyl

  10. Problems • Data collection was going to take time • Test set creation was going to need care • No time for researching architecture • Large number of classes • Subtle differences would need a deep network with attention • There was no clear use case for the output so we did not know the precision/recall balance Dr Janet Bastiman @yssybyl

  11. Resources • 3 deep learning researchers • Experienced with limited visual data • Tenacity ++ • 10 people from CTOs team (to be planned in around other commitments) for engineering, productionisation, and design • 3 good GPU laptops and 10 single GPU servers • 30 people with smartphones • Existing models for finding cars and this client’s make • Permission to scrape data from the client’s own used car site Dr Janet Bastiman @yssybyl

  12. “We haven’t the money, so we’ve got to think” Ernest Rutherford Quoted in Bulletin of the Institute of Physics (1962), 13, No.4, 102. Dr Janet Bastiman @yssybyl

  13. If you are dealing with any critical inferencing do not take shortcuts, do it properly and do it rigorously and stand up to the company and say no - make sure it’s clear that the timelines will be longer to get it right. Dr Janet Bastiman @yssybyl

  14. Has someone else solved the problem? • Google, AWS, Azure, IBM, FAIR, Clarifai etc • Algorithmia • Arxiv and GitHub • Many industry specific small companies who want use cases • Free, PAYG, license • Free resources and a bit of clever logic might solve the problem • 3rd party brings risk Dr Janet Bastiman @yssybyl

  15. Get more data • Legal public sources • https://github.com/awesomedata/awesome-public-datasets • https://www.kaggle.com/datasets • Take your own pictures/videos • access/permission? • And label it… experts or crowd? https://xkcd.com/1897/ Dr Janet Bastiman @yssybyl

  16. Go old school https://xkcd.com/2059/ Reduce the dimensionality of the problem and use Bayesian approach, KNN or SVM Dr Janet Bastiman @yssybyl

  17. Simplify the problem Image Image Car? Removal of camera artefacts in eye images to make detection easier - Jeffrey De Fauw http://blog.kaggle.com/2015/08/10/detecting-diabetic-retinopathy-in-eye-images/ Make? Specific Vehicle Specific Vehicle Removal of Doppler effect on moving source using fractional octave band shifting, F Mobley https://asa.scitation.org/doi/pdf/10.1121/2.0000578?class=pdf Δ𝑛=−r[𝑙𝑜𝑔2(1−𝑀cos𝜃sin𝜑)] Dr Janet Bastiman @yssybyl

  18. Get every last drop from what you have Have a toolkit of augmentation approaches but choose what’s relevant to your needs... Statistical anatomical modelling for efficient and personalised spine biomechanical models - I Castro Mateos PhD thesis Dr Janet Bastiman @yssybyl

  19. Augmentation - detail • Flip L/R U/D • Rotations • Reduce or enlarge bounding box coordinates by N% • Add occlusions https://www.umbc.edu/rssipl/people/aplaza/Papers/Journals/2019.GRSL.Occlusion.pdf • Change hue saturation and value of colours in the image https://arxiv.org/pdf/1902.06543.pdf • Copypairing - https://arxiv.org/abs/1909.00390# Dr Janet Bastiman @yssybyl

  20. Architecture • For some problems CNNs are robust to noisy labels and up to 20x real labels can still give business level accuracy https://arxiv.org/pdf/1705.10694.pdf • Find the right architecture and stick to and add noisy data to your training set. http://www.asimovinstitute.org/neural-network-zoo/ Dr Janet Bastiman @yssybyl

  21. Architecture • Use transfer learning - fix most of the weights of a good network and adapt the last few layers • Fast and easy retraining and works with smaller data sets in a variety of fields • (image) https://arxiv.org/abs/1903.02196 • (series) https://arxiv.org/abs/1907.01332 • (audio) https://arxiv.org/abs/1909.07526 Deep Learning for Vision Systems, Mohamed Elgendy Dr Janet Bastiman @yssybyl

  22. Things to avoid • One-shot/few shot learning - accuracy is not suitable for business needs https://towardsdatascience.com/few-shot-learning-in-cvpr19-6c6892fc8c5 • Capsule networks - really cool but only implemented on toy data sets - would need research to implement - https://arxiv.org/pdf/1906.02829v1.pdf (NLP) https://arxiv.org/pdf/1907.02957.pdf (images) • Designing an architecture from scratch • Simulated data - unless it does include the features you need and has already been created by someone else. Dr Janet Bastiman @yssybyl

  23. Back to the use case Dr Janet Bastiman @yssybyl

  24. A Demo is controllable • Expected inputs only • Not expected to go all the way to variant • Existing Make classifier • Existing binary classifier for a different model for that make • Existing demo front end • MVP: demonstrate we can identify make, model and era Dr Janet Bastiman @yssybyl

  25. What we did – for the demo • Update output key of our Model classifier – change “Other” to be the model of interest • Demo of Make and Model (as long as you didn’t show it a picture of a different model…) • How to get era? 3rd party that could return make and year • Decision tree ;) Dr Janet Bastiman @yssybyl

  26. What we did – for the demo Image Car detector Make Client A or other Model A or Other Dr Janet Bastiman @yssybyl

  27. What we did – for the demo Image Car detector 3rd Party Make, Model, Year Make Model A/B Combine Dr Janet Bastiman @yssybyl

  28. What we did – Data • 95% of effort went on data gathering/cleaning • Wrote a web scraper for client used car site • Data store with mapping for different vehicles • Added “not clean” flag and pushed through mechanical turk Image Content Image Quality Dr Janet Bastiman @yssybyl

  29. What we did – 3 months in • Demo ready as required • Pipeline for data, continuously updating • Minimised effort on internal experts • No eyeballing of the data other than initial sanity check • Lots of scripts that were prime to be automated. Dr Janet Bastiman @yssybyl

  30. Demo • Pretty well actually • Some really difficult images • Only expected images were given • Where it was wrong it was (mostly) sensibly wrong Dr Janet Bastiman @yssybyl

  31. What we did – Phase2 Image Car detector Make Model 3rd Party Make, Model, Year Submodel A Submodel C Submodel B Variant Variant Variant Variant Variant Variamt Combine Variant Variant Variant Dr Janet Bastiman @yssybyl

  32. Thank You https://xkcd.com/2191/ Dr Janet Bastiman @yssybyl

More Related