1 / 7

Data Technology Landscape Is Rapidly Evolving

Data Technology Landscape Is Rapidly Evolving. Relational hegemony is over Disruptive data technologies abound Open source, new data models, NoSQL systems One size no longer fits all Focus expanded from write- to read-intensive applications Old constraints are falling away

chana
Télécharger la présentation

Data Technology Landscape Is Rapidly Evolving

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Technology Landscape Is Rapidly Evolving Relational hegemony is over • Disruptive data technologies abound • Open source, new data models, NoSQL systems • One size no longer fits all Focus expanded from write- to read-intensive applications Old constraints are falling away • Big memory, big storage, big CPU farms, big interconnect • Virtual machines everywhere • New applications with massive data volumes (social networking, BI) • Less restrictive transaction models promote scalability “It’s time for a complete rewrite” UC Berkeley MIT Ingres Postgres Illustra Streambase Vertica VoltDB and more Mike Stonebraker OLTP OLTP Analytics 40-odd years Analytics

  2. Hadoop Mimics Google as Big Data Store Apache Software Foundation YourDataEverywhere Applications Pig Latin, Hive, Zookeeper, Vendor Analytics Megastore Google App Engine Map/Reduce Map/Reduce Data Access Technique BigTable HBase Hadoop Distributed File System Google File System Table-like Data Model Distributed File System

  3. How HDFS and GFS Work “Shared Nothing” Data Nodes Data ‘sharded’ across nodes YourDataEverywhere

  4. Map/Reduce Algorithm Standard example is word counting void map(String name, String document): // name: document name // document: document contents for each word w in document: EmitIntermediate(w, "1"); void reduce(String word, IteratorwordCounts): // word: a word // wordCounts: list of aggregated counts int sum = 0; for each pc in wordCounts: sum += ParseInt(pc); Emit(word, AsString(sum)); A programming pattern • Inspired by functional programming languages • For large scale parallel applications Parallel Algorithm • Map preps input data into <key, value> pairs, here <word, count> • Merge (or Combine) phase relevant <word, count> pairs, arranging them by word • Reduce sums counts for each word, constructs final result Optimized for unstructured data • Minimum metadata stored in dist. file system • Data knowledge resides in map and reduce programs Parts of the algorithm are patented by Google • US Patent #7,650,331 • Filed June 18, 2004, granted January 19, 2010 • Licensed to Hadoop in April, 2010 YourDataEverywhere Return

More Related