1 / 46

Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka

This Edureka Hadoop Training tutorial ( Hadoop Blog series: https://goo.gl/LFesy8 ) will help you to understand how Big Data emerged as a problem and how Hadoop solved that problem. This tutorial will be discussing about Hadoop Architecture, HDFS & it's architecture, YARN and MapReduce with a practical Aadhar use-case. Below are the topics covered in this tutorial: <br><br>1) What is Big Data? <br>2) Big Data in Different Domains <br>3) Problems Associated with Big Data <br>4) What is Hadoop? <br>5) HDFS <br>6) YARN <br>7) MapReduce <br>8) Hadoop Ecosystem <br>9) Aadhar Use-case <br>10) Edureka Big Data & Hadoop Training

EdurekaIN
Télécharger la présentation

Hadoop Training For Beginners | Hadoop Tutorial | Big Data Training |Edureka

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Agenda 1. What is Big Data? 2. Big Data Growth Driver 3. Big Data Application in Different Domains 4. Problem with Big Data Processing 5. What is Hadoop 6. Hadoop Ecosystem 7. Aadhar Case Study 8. Hadoop Job Trends 9. Edureka Hadoop Training EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  2. What is Big Data? EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  3. Big Data!!! Big data is the term for collection of data sets so large and complex that it becomes difficult to process using on-hand database system tools or traditional data processing applications EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  4. 5 V’s of Big Data EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  5. 5 V’s of Big Data Data is being generated at an alarming rate Different kinds of data is being generated from various sources Volume Variety Velocity Value ? . . . . . . V ’ s associated with Big Data may grow with time Mechanism to bring the correct meaning out of the data Uncertainty and inconsistencies in the data Veracity Value EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  6. Big Data Growth Driver Mobile Devices Internet of Things Social Media EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  7. Big Data!!! IOT Mobile Social Media IOT: 50 Billion Devices by 2020 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  8. Big Data!!! 3 major trends contributing to the growth of mobile data traffic. Adapting to Smarter Mobile Devices Defining Cell Network Advances—2G, 3G, and 4G (5G Perspectives) Reviewing Tiered Pricing—Unlimited Data and Shared Plans • • • IOT Cisco Forecasts 30.6 Exabytes per Month of Mobile Data Traffic by 2020 Mobile Exabytes per Month Social Media Mobile Devices EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  9. Big Data!!! 1,736,111 Instagram pics 4,166,667 likes & 200,000 photos IOT Mobile 204,000,000 emails 347,222 tweets Social Media 300 hours of video uploaded Social Media EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  10. Big Data Application in Different Domains EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  11. Different Domains Early warning for securities fraud & trade visibility Card fraud detection & audit trails Enterprise credit risk reporting Customer data transformation and analytics Banking & Finance Collecting, analyzing and utilizing consumer insights Leveraging mobile and social media content Understanding pattern of real-time, media content usage Communication, Media & Entertainment EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  12. Different Domains Rising medical costs Unavailability/ Unusable data Patient history and disease case histories Healthcare Collecting, analyzing and utilizing consumer insights Leveraging mobile and social media content Understanding pattern of real-time, media content usage Education EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  13. Different Domains 60% of electricity grid assets will need replacement Global installed wind capacity increased by 12.4% Smart meters become main-stream, while consumers want more control & insights into energy consumption Energy & Utilities Integration and Interoperability of Big data from different Government schemes Government EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  14. Other Domains EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  15. Problems with Big Data Processing EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  16. Problems with Big Data Highly Scalable 1 2 3 Processing data having complex structure (structured, un- structured, semi-structured) Bringing huge amount of data to computation unit becomes a bottleneck Storing huge and exponentially growing datasets EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  17. How Hadoop solved Big Data problem? EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  18. What is Hadoop? Hadoop Cluster Master HDFS (Storage) MapReduce (Processing) Allows to dump any kind of data across the cluster Allows parallel processing of the data stored in HDFS Slaves EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  19. Hadoop-as-a-Solution Storing exponentially growing huge datasets Storing unstructured data Processing data faster Allows to store any kind of data, be it structured, semi- structured or unstructured Provides parallel processing of data present in HDFS HDFS, storage unit of Hadoop is a Distributed File System Allows to process data locally i.e. each node works with a part of data which is stored on it 1 2 3 Write Read 1 hr. HDFS EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  20. HDFS EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  21. HDFS Master Node HDFS ▪ Storage unit of Hadoop NameNode ▪ Distributed File System ▪ Divide files (input data) into smaller chunks and stores it across the cluster ▪ Horizontal Scaling as per requirement Slave Node ▪ Stores any kind of data ▪ No schema validation is done while dumping data DataNode DataNode DataNode EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  22. NameNode NameNode • Master daemon • Maintains and Manages DataNodes • Records metadata e.g. location of blocks stored, the size of the files, permissions, hierarchy, etc. • Receives heartbeat and block report from all the DataNodes Secondary NameNode NameNode DataNode ▪ Slave daemons ▪ Stores actual data DataNode DataNode DataNode ▪ Serves read and write requests EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  23. Secondary NameNode Secondary NameNode • Checkpointing is a process of combining edit logs with FsImage • Allows faster Failover as we have a back up of the metadata • Checkpointing happens periodically (default: 1 hour) Secondary NameNode NameNode DataNode DataNode DataNode EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  24. YARN EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  25. YARN ResourceManager • Receives the processing requests • Passes the parts of requests to corresponding NodeManagers Resource Manager NodeManagers • Installed on every DataNode • Responsible for execution of task on every single DataNode Node Manager Node Manager Node Manager EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  26. MapReduce EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  27. MapReduce MapReduce is a software framework which helps in writing applications that processes large data sets using distributed and parallel algorithms inside Hadoop environment. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  28. Hadoop Ecosystem EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  29. Hadoop Ecosystem EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  30. Aadhaar Case-Study EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  31. Aadhar Case Study Aadhaar is a 12 digit unique-identity number issued to all Indian residents based on their biometric and demographic data. The data is collected by the Unique Identification Authority of India (UIDAI), a statutory authority established on 12 July 2016 by the Government of India • • Aadhaar is similar as SSN number Aadhaar is the world's largest biometric ID system, with over 1.133 billion • enrolled members as of 31 March 2017. As of this date, over 99% of Indians aged 18 and above had been enrolled in • Aadhaar. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  32. Aadhar Case Study (Dataset) State Of India City No. of Aadhar cards accepted No. of Aadhar cards rejected EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  33. Aadhar Case Study (Problem Statements) 1. Find out the total number of cards approved by States 2. Find out the total number of cards rejected by states. 3. Find out the total number of cards approved by cities. 4. Find out the total number of cards rejected by cities. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  34. Aadhar Case Study (Problem Statements) 1. Find out the total number of cards approved by States 2. Find out the total number of cards rejected by states. 3. Find out the total number of cards approved by cities. 4. Find out the total number of cards rejected by cities. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  35. Aadhar Case Study (Problem Statements) 1. Find out the total number of cards approved by States 2. Find out the total number of cards rejected by states. 3. Find out the total number of cards approved by cities. 4. Find out the total number of cards rejected by cities. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  36. Aadhar Case Study (Problem Statements) 1. Find out the total number of cards approved by States 2. Find out the total number of cards rejected by states. 3. Find out the total number of cards approved by cities. 4. Find out the total number of cards rejected by cities. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  37. Job Trends EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  38. Job Trend EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  39. Big Data & Hadoop Certification Training EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  40. Big Data Hadoop Certification Training EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  41. Some Big Data & Hadoop Projects @ Edureka 1 Project #1: Analyze social bookmarking sites Industry: Social Media Project #2: Customer Complaints Analysis Industry: Retail 2 Project #3: Tourism Data Analysis Industry: Tourism 3 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  42. Some Big Data & Hadoop Projects @ Edureka Project #4: Airline Data Analysis Industry: Aviation 4 Project #5: Analyze Loan Dataset Industry: Banking and Finance 5 Project #6: Analyze Movie Ratings Industry: Media 6 EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  43. Session In A Minute Big Data In Different Domains What is Big Data Growth Drivers Big Data & Hadoop Training By Edureka Problems with Big Data Hadoop-as-a-Solution 128 MB 128 MB 128 MB 128 MB 512 MB File EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

  44. EDUREKA HADOOP CERTIFICATION TRAINING www.edureka.co/big-data-and-hadoop

More Related