1 / 8

Apache Hadoop – The Big Name In The Big Data World

Apache Hadoop is an open source framework which enables users to write and run distributed applications that process large amounts of data. Distributed computing is a wide and varied field. It is related with Big Data, a collection of dataset.

Télécharger la présentation

Apache Hadoop – The Big Name In The Big Data World

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Apache Hadoop – The Big Name In The Big Data World Java/J2EE Capabilities

  2. What is Apache Hadoop? What is Apache Hadoop? • A proficient data management framework for Big Data • Open source software for distributed processing of large chunks of data • Offers distributed parallel processing across servers, ranging from a single server to multiple machines • Processing and analysis of thousands of terabytes of data • Apt framework to increase business efficiency and maximize ROI • Latest Release on 18 November, 2014: Release 2.6.0

  3. Main Modules of Hadoop Main Modules of Hadoop

  4. Main Modules of Hadoop (contd.) Main Modules of Hadoop (contd.) • Hadoop Common • Common utilities to help other Hadoop modules and support subprojects • Includes File System, RPC and serialization libraries • Hadoop Distributed File System (HDFS) • Distributed File System giving access to application data • Spans across all nodes in a Hadoop cluster to link them into one big file system • Java based, giving scalable and reliable data storage

  5. Main Modules of Hadoop (contd.) Main Modules of Hadoop (contd.) • Hadoop YARN • Utilized for job scheduling and resource management of clusters • Splits up two roles of JobTracker, namely, resource management and job scheduling into different areas • Hadoop MapReduce • System for parallel processing of large data sets • A framework that gets into work assignment to nodes in a particular cluster • Writes applications processing large amount of data, on multiple nodes of hardware with utmost reliability

  6. Other Hadoop Related Projects at Apache Other Hadoop Related Projects at Apache • Avro • Cassandra • Hbase • Hive • Pig • Spark • Ambari • Chukwa • Mahout • Tez • ZooKeeper

  7. Why Hadoop? Why Hadoop? • Next generation real time analytics • Rich eco systems • Scale-out storage • Reduced cost of ownership • Scalability, Flexibility and Reliability • Fault tolerance • Simplistic programming models

  8. THANK YOU Looking Forward To Have A Mutually Beneficial Association. Assuring You Of Our Best Services Always. SPEC INDIA "SPEC House“, Parth Complex, Swastik Cross Road, Navrangpura, Ahmedabad-380 009, INDIA. Tel.:+91-79-26404031 to 34 VoIP : + 1 - 908 - 450 - 9862 Instant Messengers spec.bd | spec_india | bd.spec specindia2009 | specindia.bd e-mail: lead@spec-india.com URL: http://www.spec-india.com

More Related