1 / 6

Introduction & Components of Hadoop Architecture

Hadoop is a batch processing system for a cluster of nodes that gives the bases of the biggest Data analytic activities because it bundles two sets of functionality, most wanted to deal with huge unstructured datasets i.e Distributed file systems and MapReduce processing.

Télécharger la présentation

Introduction & Components of Hadoop Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction & Components ofHadoop Architecture

  2. Hadoop • Hadoop is a batch processing system for a cluster of nodes that gives the bases of the biggest Data analytic activities because it bundles two sets of functionality, most wanted to deal with huge unstructured datasets i.e Distributed file systems and MapReduce processing. • It is a project from the Apache Software Foundation written in Java to assist data-intensive distributed applications. • Hadoop allows applications to operate with thousands of nodes and petabytes of data. • The incentive originates from Google’s MapReduce and Google File System papers. • Hadoop’s biggest contributor has been the search giant Yahoo, where it is widely utilized across the business platform.

  3. Map Reduce • Hadoop MapReduce is a programming model and software structure for writing applications that quickly make large amounts of data in parallel on big clusters of computer nodes. MapReduce uses the HDFS to access file parts and to save reduced results. HDFS • Hadoop Distributed File System (HDFS) is the initial storage system handled by Hadoop applications. HDFS is, as its name implies, a distributed file system that gives high throughput access to application data creating multiple copies of data blocks and sharing them on computer nodes during a cluster to enable reliable and fast computations.

  4. Architecture of Hadoop • Hadoop is a Map/Reduce framework that works on HDFS or HBase. • The central idea is to decompose a task into many identical tasks that can be executed closer to the data. • Also, all tasks are parallelized: the Map phase. Then all these intermediate results are joined into one result: the Reduce phase. • In Hadoop, The JobTracker is responsible for regulating the job, maintaining the Map/Reduce phase, retrying in case of failures. • The TaskTrackers (Java process) are running on different DataNodes. Each Task Tracker performs the tasks of the job on the locally saved data.

  5. Free Online Bigdata With Hadoop Fundamentals • StudySection offers following Bigdata With Hadoop Online Certifications -> • BigdataWith Hadoop Fundamentals Certification Exam (Foundation) • Bigdata With Hadoop Fundamentals Certification Exam (Advanced) • Bigdata With Hadoop Fundamentals Certification Exam (Expert)

  6. About Study Section • Welcome to StudySection - the most loved online platform for eCertification in several subjects including but not limited to Software Development, Quality Assurance, Business Administration, Project Management, English, Aptitude and more. From more than 70 countries students are StudySection Certified. If you are not yet StudySection certified it's not late. You can start right now.  • Being StudySection Certified helps you take your education level few notches up and have an edge over other candidates when you need it the most. Globally, our students are employed in different organizations and are utilizing the benefit of being certified with us.

More Related