1 / 11

Hadoop Training

In the coming years there are huge numbers of job opportunities in this technology. Many of the companies are working out on this technology and it will be the huge advantage for the unemployed candidates. So, it is very crucial for the candidates in the IT Industry to get updates about the Hadoop technology.So join in any institute for learining hadoop.Orien IT is the one of the best institute and leading institute for hadoop training in hyderabd.

joeconnors
Télécharger la présentation

Hadoop Training

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training Institute Flat No 204, Annapurna Block,AdityaEnclave, Ameerpet, Hyderabad – 500038,Telangana, India,9703202345 http://www.orienit.com/

  2. “We are providing demos for Hadoop Training, after completion of demo many students are showing interest to join in my institute because we are giving full knowledge to students and explaining everything. We are the one of the best in Hyderabad for Hadoop Training.” http://www.orienit.com/

  3. INTRODUCTION TO HADOOP • Using the solution provided by Google, Doug Cutting and his team developed an Open Source Project called HADOOP. Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel with others. In short, Hadoop is used to develop applications that could perform complete statistical analysis on huge amounts of data. • Hadoop is an Apache open source framework written in java that allows distributed processing of large datasets across clusters of computers using simple programming models. The Hadoop framework application works in an environment that provides distributed storage and computation across clusters of computers. Hadoop is designed to scale up from single server to thousands of machines, each offering local computation and storage. http://www.orienit.com/

  4. Hadoop Architecture At its core, Hadoop has two major layers namely: (a)Processing/Computation layer (MapReduce), and (b)Storage layer (Hadoop Distributed File System). http://www.orienit.com/

  5. MapReduce The MapReduce paradigm for parallel processing comprises two sequential steps: map and reduce.In the map phase, the input is a set of key-value pairs and the desired function is executed over each key/value pair in order to generate a set of intermediate key/value pairs.In the reduce phase, the intermediate key/value pairs are grouped by key and the values are combined together according to the reduce code provided by the user; for example, summing. It is also possible that no reduce phase is required, given the type of operation coded by the user. http://www.orienit.com

  6. At the cluster level, the MapReduce processes are divided between two applications, JobTracker and TaskTracker. JobTracker runs on only 1 node of the cluster, while TaskTracker runs on every slave node in the cluster. Each MapReduce job is split into a number of tasks which are assigned to the various TaskTrackers depending on which data is stored on that node. JobTracker is responsible for scheduling job runs and managing computational resources across the cluster. JobTracker oversees the progress of each TaskTracker as they complete their individual tasks. HDFS The Hadoop Distributed File System (HDFS) offers a way to store large files across multiple machines, rather than requiring a single machine to have disk capacity equal to/greater than the summed total size of the files. HDFS is designed to be fault-tolerant due to data replication and distribution of data. When a file is loaded into HDFS, it is replicated and broken up into "blocks" of data, which are stored across the cluster nodes designated for storage, a.k.a. DataNodes. http://www.orienit.com

  7. At the architectural level, HDFS requires a NameNode process to run on one node in the cluster and a DataNode service to run on each "slave" node that will be processing data. When data is loaded into HDFS, the data is replicated and split into blocks that are distributed across the DataNodes. The NameNode is responsible for storage and management of metadata, so that when MapReduce or another execution framework calls for the data, the NameNode informs it where the needed data resides. http://www.orienit.com

  8. How Does Hadoop Work? It is quite expensive to build bigger servers with heavy configurations that handle large scale processing, but as an alternative, you can tie together many commodity computers with single-CPU, as a single functional distributed system and practically, the clustered machines can read the dataset in parallel and provide a much higher throughput. Moreover, it is cheaper than one high-end server. So this is the first motivational factor behind using Hadoop that it runs across clustered and low-cost machines. Hadoop runs code across a cluster of computers. This process includes the following core tasks that Hadoop performs:  Data is initially divided into directories and files. Files are divided into uniform sized blocks of 128M and 64M (preferably 128M).  These files are then distributed across various cluster nodes for further processing. http://www.orienit.com

  9. How Does Hadoop Work?  HDFS, being on top of the local file system, supervises the processing.  Blocks are replicated for handling hardware failure.  Checking that the code was executed successfully.  Sending the sorted data to a certain computer.  Writing the debugging logs for each job. http://www.orienit.com

  10. Advantages of Hadoop  Hadoop framework allows the user to quickly write and test distributed systems. It is efficient, and it automatic distributes the data and work across the machines and in turn, utilizes the underlying parallelism of the CPU cores.  Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA), rather Hadoop library itself has been designed to detect and handle failures at the application layer.  Servers can be added or removed from the cluster dynamically and Hadoop continues to operate without interruption.  Another big advantage of Hadoop is that apart from being open source, it is compatible on all the platforms since it is Java based. http://www.orienit.com

  11. THANK YOU http://www.orienit.com

More Related