1 / 29

The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing

The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing. Wayne W. Eckerson Director of Research and Founder Founder, BI Leadership Forum. Agenda. Big data platforms Relational databases Analytical databases Hadoop New analytical ecosystem. What comes next?.

saul
Télécharger la présentation

The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The New BI Ecosystem: How Big Data Merges Top Down and Bottom up Computing Wayne W. Eckerson Director of Research and Founder Founder, BI Leadership Forum

  2. Agenda • Big data platforms • Relational databases • Analytical databases • Hadoop • New analytical ecosystem

  3. What comes next? • Kilobyte (KB) – 103 bytes • Megabyte (MB) –106 bytes • Gigabyte (GB) – 109 bytes • Terabyte (TB) –1012 bytes • Petabyte (PB) – 1015 bytes • – 1018 bytes • – 1021 bytes • – 1024 bytes Exabyte (EB) Zettabyte (ZB) Yottabyte (YB)

  4. What is “big data”? Data • Lots of data • Different types of data • More data than you can handle • Purpose-built analytical systems • Distributed file system • New staging area and archive • A Java developer’s employment act • A replacement for the RDBMS • A club for hip data people Systems Movement Yes!

  5. Information explosion Source: IDC Digital Universe 2009; White Paper, Sponsored by EMC, May 2009 Every 18 months, non-rich structured and unstructured enterprise data doubles

  6. Data deluge • Structured data • Call detail records • Point of sale records • Claims data • Semi-structured data • Web logs • Sensor data • Email, Twitter • Unstructured data • Video, Audio, • Images, Text “A Sea of Sensors”, The Economist, Nov 4, 2010

  7. From transactions to observations Structured  Semi-Structured  Unstructured

  8. Three big data platforms (systems) • General purpose relational database • Analytical database • Hadoop

  9. 1. General purpose RDBMS- Powers first generation DW • Benefits: • RDBMS already inhouse • SQL-based • Trained DBAs Operational System Operational System ETL BI Server ETL Reports / Dashboards Data Mart Data Warehouse Data Warehouse Operational System • Challenges: • Cost to deploy and upgrade • Doesn’t support complex analytics • Scalability and performance Operational System

  10. 2. Analytical platforms 1010data Aster Data (Teradata) Calpont Datallegro (Microsoft) Exasol Greenplum (EMC) IBM SmartAnalytics Infobright Kognitio Netezza (IBM) Oracle Exadata Paraccel Pervasive Sand Technology SAP HANA Sybase IQ (SAP) Teradata Vertica (HP) • Purpose-built database management systems designed explicitly for query processing and analysis that provides dramatically higher price/performance and availability compared to general purpose solutions. • Deployment Options • Software only (Paraccel, Vertica) • Appliance (SAP, Exadata, Netezza) • Hosted(1010data, Kognitio)

  11. Game-changing technology • Quicker to deploy • Preconfigured and tuned • Fast ROI • Faster and more scalable • Faster query response times • Linear performance • Built-in analytics • Libraries of functions • Extensible SDK • Less costly • Less power, cooling, space • Fewer people to maintain

  12. Business value of analytic platforms Analytical appliance • Kelley Blue Book – Consolidates millions of auto transactions each week to calculate car valuations • AT&T Mobility – Tracks purchasing patterns for 80M customers daily to optimize targeted marketing Analytical Database

  13. 3. Hadoop • Ecosystem of open source projects • Hosted by Apache Foundation • Google developed and shared concepts • Distributed file system that scales out on commodity servers with direct attached storage and automatic failover.

  14. Hadoop distilled: What’s new? • Benefits • Comprehensive • Agile • Expressive • Affordable Unstructured data Data scientist Open Source $$ Distributed File System “Schema at Read” MapReduce BIG DATA • Drawbacks • Immature • Batch oriented • Expertise • TCO No SQL

  15. Hadoop ecosystem Source: Hortonworks

  16. Hadoop use cases • Sabre Holdings • Analyze airline shopping data • Vestas • Site wind turbines by modeling larger volumes of weather data • CBS Interactive • Optimize ad placement and pricing • Nokia • Identify new data services

  17. Hadoop hype Overheard “Hadoop will replace relational databases.” “Hadoop will replace data warehouses.” “Hadoop has a superior query engine compared to analytical platforms.” “Use Hadoop for any application that requires more than one node.” Gartner Group – Hype Cycle

  18. Hadoop adoption rates Based on 158 respondents, BI Leadership Forum, April, 2012

  19. Hadoop workloads Based on respondents that have implemented Hadoop. BI Leadership Forum, April, 2012

  20. Which platform do you choose? Hadoop Analytic Database General Purpose RDBMS Structured  Semi-Structured  Unstructured

  21. Big data platform comparison

  22. The New BI Ecosystem

  23. BI Framework 2020 Monitoring Casual Users Exploration Power Users Business Intelligence End-User Tools Dashboard Alerts Reports and Dashboards Operational Dashboards (DW-driven dashboards ) Design Framework Keyword search & faceted navigation Event detection and correlation MAD Dashboards Architecture CEP, Streams Data Ware- housing Data Warehousing Reporting & Analysis Content Intelligence Keyword search, BI tools, Xquery, Hive, Java, etc. MapReduce, XML schema, Key-value pairs, graph notation, etc. HDFS, NoSQLdatabses Event-Driven Alerts and Dashboards Continuous Intelligence Event-driven Analytic Sandboxes Analytic Sandboxes Ad hoc SQL Ad hoc query, Spreadsheets, OLAP, Visual Analysis, Analytic Workbenches, Hadoop Decision Automation Non-relational queries Excel, Access, OLAP, Data mining, visual exploration Analytics Intelligence

  24. BI Framework • Pros: • - Alignment • Consistency • Cons: • - Hard to build • - Politically charged • - Hard to change • - Expensive • - “Schema Heavy” TOP DOWN- “Business Intelligence” Corporate Objectives and Strategy Reporting & Monitoring (Casual Users) Data Warehousing Architecture Non-volatile Data Predefined Metrics Reports Beget Analysis Analysis Begets Reports • Pros: • - Quick to build • - Politically uncharged • - Easy to change • Low cost • Cons: • - Alignment • - Consistency • - “Schema Light” Ad hoc queries Volatile Data Analytics Architecture Analysis and Prediction (Power Users) Processes and Projects BOTTOM UP – “Analytics Intelligence”

  25. The new analytical ecosystem

  26. Analytical sandboxes Operational Systems (Structured data) Operational System Extract, Transform, Load (Batch, near real-time, or real-time) Casual User Streaming/ CEP Engine Alerts Operational System Reports /Dashboards BI Server Data Warehouse Virtual Sandboxes Machine Data Dept Data Mart Hadoop Cluster Top-down Architecture Bottom-up Architecture Web Data In-memory Sandbox Query Upload & query Audio/video Data Free- Standing Sandbox Query Query Query Analytic platform or non-relational database External Data Query Power User Documents & Text

  27. Workflows “Capture only what’s needed” Analytical database (DW) 1. Extract, transform, load Source Systems 7. Export data 4. Land multi-structured data 3. Archive detail 2. Query data “Capture in case it’s needed” 5. Explore data 9. Report and mine data Analytical tools 6. Parse, aggregate 8. Federated query Data mapping

  28. Recommendations • Explore applications for multi-structured data • Apply the right tool for the job • RDBMS, Analytical platform, Hadoop, NoSQL • Make power users full-fledged members of your BI environment • Reconcile top-down and bottom-up BI environments  Create an analytical ecosystem!

  29. Questions? • Wayne Eckerson • weckerson@bileadership.com • Analytical thought leader • Founder, BI Leadership Forum • Director of Research, TechTarget • Former director of research at TDWI • Author

More Related