30 likes | 45 Vues
Hadoop is owned by the Apache. Hadoop is a framework which is majorly used for storing and processing the big data.
E N D
WHATISHADOOPANDITS COMPONENTS? • WHAT ISHADOOP? • Hadoop is owned by the Apache. Hadoop is a framework which is majorly used for storing and processing the big data. Hadoop is an open-source software which means that anyone can change its features. The code of the Hadoop software is openly available. The Hadoop software works in the distributed computing environment. This environment of Hadoop is known as Hadoop Distributed File System. The software is written in the Java programminglanguage. • FEATURES OFHADOOP • Here are some features of the Hadoop software which makes it more user-friendly. They are listed below--> • Perfect for the analysis of big data • Hadoop is one of those frameworks which are capable of processing and storing the big data. If the data is processed in the Hadoop software, then it will require low bandwidth because the logic of the processing is transferred to the computing nodes instead of the actual data. Here, low bandwidth means we can work on the low internet connection also. This activity increases the efficiency of the applications built in Hadoop. This concept of Hadoop is known as the data localityconcept. • Scalability • The scalability means the Hadoop clusters can be extended to any number of computer nodes. There would not be any problem. For the extension, we do not need to modify the logic of theapplications. • Faulttolerance • The best feature of the Hadoop framework is its backup system. The Hadoop software automatically makes the copies of the input data. The data can be recovered easily in case of systemfailure.
MODULES OF THE HADOOP FRAMEWORK • Here are the modules of the Hadoop framework. Modules mean the small and important parts of the Hadoop software. They are listedbelow--> • HDFS • HDFS is an important component of the Hadoop framework. What it does is, it breaks the input data into small pieces and distributes them over the computer nodes. The Hadoop works on the principle of parallelcomputing. • YARN • YARN stands for Yet Another Resource Negotiator. The task of the YARN is job scheduling and the management of theclusters. • MapReduce • The MapReduce is a framework which does parallel computing in the Hadoop with the help of key-value pair. What it does is, it receives the input data and converts the input data into a data set which can be computed into the key-value pairs. The output of such input is taken with the help of the reduce taskmethod. • HadoopCommon • These are nothing but the Java libraries. These libraries are used for initiating the Hadoop software. These libraries are also used by the other modules of the Hadoopsoftware. • ADVANTAGES OF THEHADOOP • Here are some advantages of the Hadoop software. They are listedbelow--> • It can prevent system failure. • It isscalable. • It iscost-effective. • The framework isfast. • It isflexible.
CONCLUSION As said above, Hadoop is used for the management of big data. We know that at present every sector is working on the big data. The big data has a bright future and so does the Hadoop. Big data is not possible without frameworks like Hadoop. Students should learn Hadoop. Many courses are available which provide a complete course on Hadoop as well as Hadoopcertification.