1 / 10

Fun and Innovative Applications of MapReduce in Data Science

Join Teradata Data Scientist Chris Hillman as he explores entertaining and practical applications of MapReduce, from face detection and character recognition to speech-to-text and text mining. Dive into step-by-step instructions for key tasks, like recognizing number plates and counting words in audio files. Discover the challenges and benefits of building your own physical or virtual computing cluster, including performance testing and configuration experimentation. Learn why tackling these projects is not only educational but also a lot of fun!

rumor
Télécharger la présentation

Fun and Innovative Applications of MapReduce in Data Science

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 7 Fun Things to do with MapReduce Chris Hillman – Teradata Data Scientist Christopher.hillman@teradata.com @chillax7

  2. Agenda Map Tasks Face Detection Character Recognition Speech to Text Shuffling Mass Spectrometer processing Reducers Text Mining Actual Mining Cluster Building

  3. Face Detection in Images Step Step 1. Get a good Open Source Library Step 2. Check the Example Code @chillax7

  4. Character Recognition Step More Complex Task than Face Detection SELECT * FROM RecognizeNumberPlate(       ON anpr.vehiclelogs imagecol('recognizedobject')); @chillax7

  5. Speech to Text How about counting words in a recorded wav file? Step Fed up with word count examples? @chillax7

  6. Proteomics Step Mass Spectrometers Create a lot of data…. In XML format…. It’s nasty to work with @chillax7

  7. Text Mining Step First phases are map tasks Text Extraction and Parsing @chillax7

  8. Actual Mining Step Comparing Seismic surveys taken at different points in time?? @chillax7

  9. Cluster Building Physical or Virtual? Physical – more fun, looks impressive, harder to build, maintain, use, cost of power Virtual – performance? Easier to test, try different versions, configurations • Why Build your own cluster? • It’s fun • You learn lots • It gets you invited to parties Step @chillax7

  10. Thank you Chris Hillman Christopher.hillman@teradata.com @chillax7 www.bigdatablog.co.uk = + +

More Related