1 / 12

Learn What are the best resources to learn MapReduce & Hadoop - Mindmajix

Learn Overview of Hadoop Mapreduce<br>

gracylayla
Télécharger la présentation

Learn What are the best resources to learn MapReduce & Hadoop - Mindmajix

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learn Overview of Hadoop Mapreduce https://mindmajix.com/

  2. Overview of Hadoop MapReduce • MapReduce  is a soft work framework for easily writing applications which process vast amounts of data (multi tera bytes) in- parallel on large clusters of commodity hard work in a reliable, fault- tolerant manner. (Learn about the basics of MapReduce in the column Mapreduce Tutorial) • Map Reduce job usually splits the input data-set into independent chunks which are processed by the map tasks in a completely parallel manner. • The framework sorts the outputs of the maps, which are then act as inputs to the reduce tasks. • Typically, both input and output of the job are stored in a file system and the framework takes care of scheduling tasks, monitoring them and re executes the failed tasks. • MapReduce framework consists of a single master Job tracker and one slave task tracker per cluster node. • The master is responsible for scheduling the jobs component tasks on the slaves, monitoring them and re-executing the failed tasks and the solves execute the tasks as directed by the master. https://mindmajix.com/

  3. What Are The Different Types of The Input Formats? 1. Text input format 2. Key value text input format 3. Nline input Format 4. Sequence File input format 5. Sequence File As Text Input Format 6. Sequence file As Binary Input Format 7. DB Input Format 8. Multiple inputs https://mindmajix.com/

  4. What Are The Different Types of The Output Formats? 1. Text output format 2. Sequence file output format 3. Sequence File As Binary Output format https://mindmajix.com/

  5.  1. Text input format:•   It is a default input format and record is a line of input.•   The key for each record is the byte offset from beginning of each line and value.•   The output format for this is the Text output format and its final output is a key-value pair which is delimited by a tab.2. Key value text input format:•   If the output of the Text output format is sent to the i/p tab, we can specify the input format for this job, as this acts as a key value pair since the output is a delimited key       – value pair.•   We can override the default delimiter using this property https://mindmajix.com/

  6. 3. Nline input Format:•   This is used if we want to move each mapper to receive a fixed number of lines as input.•   Map reduce. Input. Line input format. Lines per map is set properly to the N value.•   This also works as a Text input format, but the difference is in the number of lines per input.•   Map Reduce has support for binary format as well4. Sequence File input format:•   This file format shores sequences of binary key-value pairs.•   Sequence files are well suited as a format for Map reduce data since they are split table.•   To use the sequence file data as input as input to the map per, we need to mention input format as the sequence file input format.•   Need to mention the key-value data types as per the sequence file key and value types https://mindmajix.com/

  7. 5. Sequence File As Text Input Format: •  This is like a sequence file input format, but it converts the sequence files, keys and values to text objects.•    The conversion is performed by calling to storing() on the keys and values.? 6. Sequence file As Binary Input Format:This is live sequence file input format that retrieves the sequence files, keys and values as opaque binary objects and they are encapsulated as Bytes writable objects. 7. DB Input Format:It is used when reading the data from a relational database using JDBC. 8. Multiple inputs:•   This technique is used when we need to process the data which could be in the same format or in a different format but may have different representation.•   While using this, we need to mention a map per and input format for each input pat    https://mindmajix.com/

  8. Text output format: This is the default output format, the key and values are separated by tab delimiter.The delimiter can be changed using the property. We can support the key or value from the output using Null writable type. 2. Sequence file output format:It writes sequence files for its output. 3. Sequence File As Binary Output format: It writes keys and values in raw binary format into a sequence file container. The output format classes generates set of files as output and one file for each reduces and their names are part-r-00000,part-r-00001 etc.If we want to write multiple output files for each reduce then we will use multiple output classThis will generate one output file for each key in the reduces and the name can be pre fined with the key.  https://mindmajix.com/

  9. Data Types and custom writable •   Hadoop comes with a large selection of writable classes in the org. a apache.Hadoop. io package•   Hadoop provides all writable wrappers for all the JAVA primiters types except char and hare a get() and set() method for retrieving and storing the data.     1.  Byte writable- byte, Boolean writable-boolean.     2.  Short writable-short, int writable and vint writable- int     3.  Float writable-float, double writable-double.     4.  Long writable and V long writable- long. •   When we have numeric’s, we can select either fined – length (Int writable and long writable) or variable length (V int writable and v long writable)Text– It is equivalent to string in Java https://mindmajix.com/

  10. Null writable: It is a special type of writable. And No bytes are written to. or read from the stream.•   In Map Reduce, a key or value can be declared as a Null writable when we don’t want to use this in the final output.•   It is an immutable single ton and the instance can be received by null writable. get()•   This will store an empty value in the output.Writable collections:- There are six writable collection types in Hadoop.•   Array writable, TwoDArrayWritable, Array primitive writable.•   Map writable, StoredMapWritable and EnumSetWritable•   ArrayPrimitiveWritable is a wrapper for arrays of Java Primitives•   Array writable and TwoDArrayWritable are writable implementations for arrays and two – dimensional arrays (array of arrays) of writable instances.•   All the elements of an array writable or two D Array writable must be instances of the same class. Custom Writable:-Instead of the existing writable classes, if we want to implement our own  writable classes, then we can develop a custom writable by implementing a writable comparable interface. https://mindmajix.com/

  11. Boost Your Career Opportunities By Enrolling Into The Mindmajix Technologies “Free & Live HadoopAdministration Demo”. Contact Details:: MindmajixTechnologies INDIA: +91 924 633 3245 USA :- +1 201 3780 518, +1 972-427-3027 Email: info@mindmajix.com Official Website: https://mindmajix.com/ Learn how to use Hadoop & Mapreduce, from beginner basics to advanced techniques: Hadoop Tutorial Hadoop Interview Questions Mapreduce Tutorial MapReduce Interview Questions https://mindmajix.com/

More Related