What is Pig ???
E N D
Presentation Transcript
Why Pig ??? • MapReduce is difficult to program. • It only has two phases. • Put the logic at the phase. • Too many lines of code even for simple logic. • Need Jobchain for long dataflow.
Pig Features??? • High level processing layer on top of MapReduce. • Pig has a high-level data processing data-flow language. Provides a sequence of steps where each step is single high level data transformation. • No metadata or schema required for data. • Simplifies joining data and chaining jobs together. • Pig latin includes operators for many of the traditional data operations. • Supports UDF
Running Pig • Three ways to run pig • Grunt Interactive Shell • Through a script file • Embedded queries inside java program • Pig Data Types • Int, long, float, double, chararray, bytearray • Tuple , Bag, Map
Pig Latin(1/4) grunt> log = LOAD 'tutorial/data/excite-small.log' ➥ AS (user:chararray, time:long, query:chararray); grunt> grpd = GROUP log BY user; grunt> cntd = FOREACH grpd GENERATE group, COUNT(log); grunt> STORE cntd INTO 'output';
Pig Latin(2/4) grunt> DESCRIBE log; log: {user: chararray,time: long,query: chararray} grunt> DESCRIBE grpd; grpd: {group: chararray,log: {user: chararray,time: long,query: chararray}} grunt> DESCRIBE cntd; cntd: {group: chararray,long}
Pig Latin(3/4) grunt> ILLUSTRATE cntd; | log | user: bytearray | time: bytearray | query: bytearray | | | 0567639EB8F3751C | 970916161410 | "conano'brien" | | | |0567639EB8F3751C | 970916161413 | "conano'brien" | | | |972F13CE9A8E2FA3 | 970916063540 | finger AND download | | ------------------------------------------------------------------------------------- | log | user: chararray | time: long | query: chararray | | | 0567639EB8F3751C | 970916161410 | "conano'brien" | | | |0567639EB8F3751C | 970916161413 | "conano'brien" | | | 972F13CE9A8E2FA3 | 970916063540 | finger AND download | | -------------------------------------------------------------------------------------- | grpd | group: chararray | log: bag({user: chararray,time: long, query: chararray}) | | | 0567639EB8F3751C | {(0567639EB8F3751C, 970916161410, "conano'brien"), (0567639EB8F3751C,970916161413, "conano'brien")} || | | 972F13CE9A8E2FA3 | {(972F13CE9A8E2FA3, 970916063540, finger AND download)} || --------------------------------------------------------------------------------------------------------------------- | cntd | group: chararray | long | | | 0567639EB8F3751C | 2 | | | | 972F13CE9A8E2FA3 | 1 | |
Pig Latin(4/4) Built-in function in Pig Latin AVG, CONCAT, COUNT, DIFF, MAX, MIN, SIZE, SUM, TOKENIZE, IsEmpty Relational Operators in Pig Latin SPLIT, UNION, FILTER, DISTINCT, SAMPLE, FOREACH, JOIN, GROUP, COGROUP