150 likes | 278 Vues
Join Joe Hummel, PhD, as he explores the integration of Hadoop with Azure to enhance big data processing through Map-Reduce. Learn the history and functionality of Map-Reduce, its application in analyzing Chicago crime data, and how to utilize Hadoop's rich ecosystem including tools like Pig and Hive. Discover how to process data easily and visualize results using Excel's PowerPivot. This presentation is a valuable resource for data enthusiasts looking to leverage Hadoop's capabilities on cloud platforms.
E N D
hello Map-Reduce!” “Introducing Hadoop on Azure: Joe Hummel, PhD Visiting Researcher: U. of California, IrvineAdjunct Professor: U. of Illinois, Chicago & Loyola U., Chicago Materials: http://www.joehummel.net/downloads.html Email: joe@joehummel.net
A little history… • Map-Reduce is from functional programming // function returns 1 if i is prime, 0 if not: letisPrime(i) = ... // sums 2 numbers: letsum(x, y) = return x + y // count the number of primes in 1..N: letcountPrimes(N) = let L = [ 1 .. N ] // [ 1, 2, 3, 4, 5, 6, ... ] let T = mapisPrime L // [ 0, 1, 1, 0, 1, 0, ... ] let count = reducesum T // 42 return count Hadoop on Azure
A little more history… • Hadoop: • Created by to drive internet search • Parallelism • Data partitioning • Fault tolerance BIG Data page hits
Hadoop today • Freely-available framework for big data • http://hadoop.apache.org/ • Based on concept of Map-Reduce: mapfunction reduce intermediate results BIG data Map Map Reduce R Map Map . . . . . .
Workflow Data Map Map Map [ <key1,value>, <key4,value>, <key2,value>, … ] Sort Sort Sort [ <key1,value>, <key1,value>, … ] Merge [ <key1, [value,value,…]>, <key2, [value,value,…]>, … ] Reduce R [ <key1, value>, <key2, value>… ]
Data set for demo • We’ll be working with Chicago crime data… • https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2 • http://www.cityofchicago.org/city/en/narr/foia/CityData.html 1 GB 5M rows
Goal? • Compute top-10 crimes… IUCR Count 0486 366903 0820 308074 . . . 0890 166916 IUCR = Illinois Uniform Crime Codes https://data.cityofchicago.org/Public-Safety/Chicago-Police-Department-Illinois-Uniform-Crime-R/c7ck-438e
Demo • Hadoop on Azure… • // Javascript version: • varmap= function (key, value, context) • { • var values = value.split(","); • context.write(values[4], 1); • }; • varreduce= function (key, values, context) • { • var sum = 0; • while (values.hasNext()) • { • sum += parseInt(values.next()); • } • context.write(key, sum); • }; 0486 366903 0820 308074 . . . Hadoop on Azure
Hadoop++ • Rich ecosystem around Hadoop • Pig • Hive • HBASE • … • // interactive PIG with explicit Map-Reduce functions: • pig.from("CC-from-2001.txt"). • mapReduce("IUCR-Count.js", "IUCR, Count:long"). • orderBy("Count DESC"). • take(10). • to("output-from-2001") • // interactive PIG without explicit Map-Reduce: • schema = "ID,CaseNumber,Date,Block,IUCR,..." • pig.from("CC-from-2001.txt", schema). • groupBy("IUCR"). • select("group, SUM($1.Count"). • orderBy("Count DESC"). • take(10). • to("output-from-2001") Hadoop on Azure
Hadoop on Azure • Microsoft is offering free access to Hadoop • Request invitation @ http://www.hadooponazure.com/ • Hadoopconnector for Excel • Process data using Hadoop, analyze/visualize using Excel Hadoop on Azure
PowerPivot • Freely-available plugin for Excel 2010 • http://www.powerpivot.com/ • Turns Excel into an in-memory database • More precisely, turns spreadsheet into an OLAP cube • Note: • If you have 32-bit Excel, install 32-bit PowerPivot • If you have 64-bit Excel, install 64-bit PowerPivot • GBs of data will require 64-bit • [ How to tell what version of Excel you have? File menu, help… ] Big Data Processing, Cheap
Demo • PowerPivot… • Install • PowerPivot menu • PowerPivot Window • Get Data... • PivotTable… Big Data Processing, Cheap
Compare and contrast Big Data Processing, Cheap
That’s it! Big Data Processing, Cheap
Thank you for attending • Presenter: Joe Hummel • Email: joe@joehummel.net • Materials: http://www.joehummel.net/downloads.html • Keep an eye for final release of: • Hadoop on Azure • Hadoop on Windows • PowerView plugin for Excel 2013 Big Data Processing, Cheap