620 likes | 1.77k Vues
This Edureka Big Data Analytics Tutorial will help you to understand the basics of Big Data domain. Learn how to analyze Big Data in this tutorial. Below are the topics covered in this tutorial: <br><br>1) Big Data Introduction <br>2) What is Big Data Analytics? <br>3) Why Big Data Analytics? <br>4) Stages in Big Data Analytics <br>5) Big Data Analytics Domains <br>6) Big Data Analytics Use Cases <br><br>Subscribe to our channel to get updates. <br><br>Check our complete Hadoop playlist here: https://goo.gl/4OyoTW
E N D
EDUREKA HADOOP CERTIFICATION TRAINING What is Hadoop? www.edureka.co/big-data-and-hadoop
What are we going to learn? 2 What is Big Data Analytics? 1 Big Data Introduction 3 4 Why Big Data Analytics? Stages in Big Data Analytics 5 6 Big Data Analytics Use Cases Big Data Analytic Domains EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Exploding Global Data EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Fun Facts about Global Data In 5 years there will be over50 Billion smart connected devices in the world We create 2.5 Quintillion Bytes of Data Everyday 6.1 Billion global smartphone users by 2020 Exabyte EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Fun Facts About Big Data If one stores the total Global Data into discs and pile up the discs into stack, it will grow longer than that of Eiffel Tower 300 m EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
What is Big Data? EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
5 V’s Definition of Big Data Volume Velocity Variety Value ? Veracity Value EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Growth Drivers EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
IOT: 50 Billion Devices by 2020 “~6 things online” per person Sensors, Smart, Objects, Device Clustered Systems 50 Rapid adoption rate of digital infrastructure 5x faster than electricity & telephony Billion SmartObjects Tablets, Laptops, Phones Inflection Point World Population 6.307 6.894 7.83 6.721 7.347 2003 2010 2020 2008 2015 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Data Generated Every Minute FACEBOOK Users like 4,166,667 posts REDDIT Users cast 18,327 votes YOUTUBE Users upload 300 hours of new video TWITTER Users send 347,222 tweets INSTAGRAM Users like 1,736,111 posts EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Why Big Data Analytics? EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Why Big Data Analytics? Cost effective storage system for huge data sets Provides ways to analyze information quickly and make decisions Faster and Better Decision Making Cost Reduction Automated Car, Healthcare, etc. Evaluation of customer needs & satisfaction Big Data Analytics Next Generation Products Improved Services or Products EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
What is Big Data Analytics? EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
What is Big Data Analytics? “Big data analytics examines large and different types of data to uncover hidden patterns, correlations and other insights” EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Stages in Big Data Analytics EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Stages in Big Data Analytics ? Designing Data Requirement Pre-processing Data Identifying Problem Performing Analytics Over Data Visualizing Data EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Domains EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytics Domains Web & E - Tailing Tele - communication Government B I G DATA A N A LY T I C S D O M A I N S Healthcare Finance & Banking Retail EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytics Use Cases EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytics Use Cases Big Data helped Donald Trump to win against Hillary Clinton in the US election Collect Personal data from various resources like club cards, newspaper Subscription, social media, etc. 1 Messages were targeted based on voter profiles using platforms such as Facebook, Snapchat, Pandora radio, etc. Build an algorithm that generated top cities to reach the highest concentration of persuadable voters EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytics Use Cases Walmart boosted its sales by leveraging the power of Big Data While forecasting the demand for emergency supplies for approaching Hurricane Sandy, they gain some amazing insights: Extra supplies of Strawberry Pop Tarts were dispatched to stores in Hurricane Sandy's path in 2012, and sold extremely well Along with flashlights and emergency equipment, they found an upsurge in sales of strawberry Pop Tarts EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytics Use Cases Apixio uses big data analytics to improve healthcare decision The patient data model generated is aggregated across population to derive larger insights like disease prevalence, treatment patterns, etc. Analysis of medical data using variety of different methodologies & algorithms that are machine learning based and have NLP capabilities 80% of medical and clinical information about patients is in unstructured format, such as written physician notes EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytical Tools EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytical Tool Hadoop provides a scalable solution to store and process huge data sets in parallel and distributed fashion. Apache Hive is a data warehousing tool that allows us to perform big data analytics using Hive Query Language which is very similar to SQL. Apache Pig is a platform, used to analyze large data sets representing them as data flows. Apache Spark is an in-memory data processing engine that allows us to efficiently execute streaming, machine learning or SQL workloads and requires fast iterative access to datasets. Apache HBase is a NoSQL database that allows us to store unstructured and semi – structured data with ease and provides real time read/write access. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Big Data Analytics Courses at Edureka EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Edureka Big Data Analytics Courses Spark and Hadoop Developer Data Analyst Hadoop Admin Data Analytics with R Certification Training Linux Administration Certification Training Big Data Hadoop Certification Training Hadoop Administration Certification Training Big Data Hadoop Certification Training Apache Spark Certification Training Data Scientist Data Engineer Data Science Certification Training Statistics Essentials for Analytics Data Analytics with R Certification Training Machine Learning with Mahout Certification Training Data Analytics with R Certification Training Big Data Hadoop Certification Training Big Data Hadoop Certification Training EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide Challenges: Current data infrastructure is not capable of storing and processing the data generated by users everyday Enhancing current infrastructure was very expensive and has limited scalability capabilities 1.5 Million Flight & 1 Million Hotel searches every day P R O C E S S I N G 500 GB log data per day warehouse users orbitz.com EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide Requirement: Solution: Apache Hadoop Efficient and long term Storage System that can store any kind of data Analytical tool for making important business decision Cost Effective Open Source framework that used to store and process huge data sets Easily scalable as per the need Comes with various analytical tools EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide Accomplishment with Hadoop: Comparison of performance of previous methodology to Hadoop implementation Months worth of data is archived easily Earlier process took 109m 14s for extracting and processing logs whereas MapReduce process took 25m 58s only Allow them to easily derive various metrics for analytics which was a tedious task earlier EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Analysis of Website Log at Orbitz EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide Types of Website Logs 1. Impression List: It contains the ranking of each hotel in the search bar along with the session id of the visitor who has clicked on it. Format of Impression List: (session_id, hotel_id, position, rate) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide 2. WebTrends Log: It contains the details of customers who have booked a hotel through the website. Format of WebTrends Log: (session_id, visitors_ip, hotel_id, booking_date, number_of_guests, booking_time) EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide Problem Statement: Analyze un-cleaned website logs i.e. WebTrend Log & Impression Log and find the position of each hotel in the search bar against its frequency of booking EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide Hadoop Deployment: Analyst source Hadoop Cluster log data query result load data Apache Hive Local MapReduce for processing uncleaned data EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Hadoop Case Study – Orbitz Worldwide 1 Large amount of 1 3 2 unstructured log data Can parallelly 4 Can store any generated every day process data faster Output Structured Data type of data 7 Write fancy query to 6 analyze hotel position Hive Query 5 In search bar using log data Language ? 8 Analytical Report EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Summary EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Summary What is Big Data? Why Big Data Analytics? What is Big Data Analytics? Stages in Big Data Analytics Big Data Analytics Use Cases Big Data Domains EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop
Thank You… Questions/Queries/Feedback EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop