1 / 37

IBM Big Data Platform Overview

IBM Big Data Platform Overview. Martin Pavl í k +420 731 435 691 martin_pavlik@cz.ibm.com. Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data. Cost effectively manage and analyze all available data in its native form

wayde
Télécharger la présentation

IBM Big Data Platform Overview

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IBM Big Data Platform Overview Martin Pavlík +420 731 435 691 martin_pavlik@cz.ibm.com

  2. Big Data is a Hot Topic Because Technology Makes it Possible to Analyze ALL Available Data Cost effectively manage and analyzeall available data in its native form unstructured, structured, streaming Social Media Website Billing Network Switches ERP CRM RFID

  3. BIG DATA is not just HADOOP Understand and navigate federated big data sources Federated Discovery and Navigation Hadoop File System MapReduce Manage & store huge volume of any data Data Warehousing Structure and control data Stream Computing Manage streaming data Text Analytics Engine Analyze unstructured data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM

  4. Business-Centric Big Data Enables You to Start With a Critical Business Pain and Expand the Foundation for Future Requirements • “Big data” isn’t just a technology—it’s a business strategy for capitalizing on information resources • Getting started is crucial • Success at each entry point is accelerated by products within the Big Data platform • Build the foundation for future requirements by expanding further into the big data platform

  5. 1 – Unlock Big Data

  6. 2 – Analyze Raw Data

  7. Monthly sales reports Profitability analysis Customer surveys Merging the Traditional and Big Data Approaches Big Data Approach Iterative & Exploratory Analysis Traditional Approach Structured & Repeatable Analysis IT Delivers a platform to enable creative discovery Business Users Determine what question to ask Business Explores what questions could be asked IT Structures the data to answer that question Brand sentiment Product strategy Maximum asset utilization

  8. InfoSphere BigInsights is more than just HADOOP

  9. Hadoop • Open-source software framework from Apache • Inspired by • Google MapReduce • GFS (Google File System) • HDFS • Map/Reduce

  10. InfoSphere BigInsights Can run also on top of Platform for volume, variety, velocity • Enhanced Hadoop foundation Analytics • Text analytics & tooling • Application accelerators Usability • Web console • Spreadsheet-style tool • Ready-made “apps” Enterprise Class • Storage, security, cluster management Integration • Connectivity to Netezza, DB2, JDBC databases, etc Enterprise Edition Licensed Application accelerators Pre-built applications Text analytics Spreadsheet-style tool RDBMS, warehouse connectivity Administrative tools, security Eclipse development tools Performance enhancements . . . . Basic Edition Enterprise class Free download Integrated install Online InfoCenter BigData Univ. Apache Hadoop Breadth of capabilities

  11. Spreadsheet-style Analysis Web-based analysis and visualization Spreadsheet-like interface Define and manage long running data collection jobs Analyze content of the text on the pages that have been retrieved

  12. Build a Big Data Program – MapReduce example • Eclipse tools • For Jaql, Hive, Pig Java MapReduce, BigSheets plug-ins, text analytics, etc.

  13. JAQL – IBM’s programming language in hadoop world • Jaql is a complete solutions environment supporting all other BigInsights components • Integration point for various analytics • Text analytics • Statistical analysis • Machine learning • Ad-hoc analysis • Integration point for various data sources • Local and distributed file systems • NoSQL data bases • Content repositories • Relational sources (Warehouses, operational data bases) BigInsights Text Analytics Statistical Analysis (R module) Machine learning (SystemML) Ad-Hoc analysis (BigSheets) (Integration) DB2, Netezza, Streams, … Jaql Jaql Modules Jaql Core Operators Jaql I/O File System RDBMS DFS NoSQL

  14. BigInsights and the data warehouse Traditional analytictools Big Data analytic applications Data warehouse BigInsights Filter Transform Aggregate

  15. 3 – Simplify your warehouse

  16. OK. We have to evaluate a lot of statistics, set the correct db indexes and db partitioning. It will take us 5 days. I need to evaluate the possible relationship between client salary and overdrafts IT Analyst

  17. Done. You can run your analytical query. Great. Thanks a lot. I’m going to check the results. IT Analyst After 5 days ...

  18. Noooo!!! It’s not possible to work here! Ohhh, welcome dear friend. Understand. So, it’s …. another 5 days of our work Great. I can see here some nice correlations.Now I need to look at it from the different perspective. IT Analyst After 10 minutes ...

  19. And now with Netezza ...

  20. I need to evaluate the possible relationship between client salary and overdrafts. I will use Netezza. IT Analyst

  21. Great. I can see here some nice correlations.Now I need to look at it from the different perspective. With NetezzaI can run the query immediately. The response will be in the same time IT Analyst IT can do something else – much more useful After 12 minutes ...

  22. Go to 'View > Header and Footer' to change this footer text to the event title Built-In Expertise Makes This as Simple as an Appliance • Dedicated device • Optimized for purpose • Complete solution • Fast installation • Very easy operation • Standard interfaces • Low cost

  23. In October 2012 IBM Netezza was renamed to IBM PureData System for Analytics

  24. Netezza Genesis in T-Mobile CZ Proof-Of-Concept Project New EnterpriseDataWarehouseplatform selection Comparison of existing and other platforms SelectionCriteria Performance OperationalSavings ….andthewinnerwas: Netezza

  25. Netezza Genesis in T-Mobile CZ Expectations Significant response improvement: Faster platform means better reports response Direct Data Availability Higher trust in data , one version of truth Aggregation reduction Any attribute available Operational Benefits Storage savings (no data replicas) Administration costs reduction(DBA) Infrastructure Simplification Lower environment complexity

  26. Netezza Genesis in T-Mobile CZ Project Implementation EDW platformmigration Netezzaplatformimplementation ETL graphs/processesredesign BI Front-EndToolMigration SAP Business Objectimplementation Allreportsredesign MainIntegration Partner: T-System CZ

  27. Netezza Genesis in T-Mobile CZ Actual Status All relevant ETL procecessing redesigned Actual parallel run to Original and Netezza platform finished Netezza as only primary platform

  28. Real Netezza experience from T-Mobile Czech Rep. RESPONSE TIME MASSIVELY IMPROVED

  29. 4 – Reduce costs with Hadoop

  30. BigInsights and the data warehouse Traditional analytictools Big Data analytic applications From Cognos BI via Hive JDBC BigInsights • Query-ready archive for “cold” warehouse data Data Warehouse

  31. SQL Language JDBC / ODBC Driver JDBC / ODBC Server Future: The SQL interface . . . . Application • Rich SQL query capabilities • SQL '92 and 2011 features • Correlated subqueries • Windowed aggregates • SQL access to all data stored in InfoSphere BigInsights • Robust JDBC/ODBC support • Take advantage of key features of each data source • Leverage MapReduce parallelismORachieving low-latency SQL interface Engine Data Sources HiveTables HBase tables CSV Files InfoSphere BigInsights

  32. 5 – Analyze Streaming Data

  33. Why and when to use InfoSphere Streams? Applications needing on-fly processing, filtering and analyzing streaming data At least 2 criteria from the list bellow should be fulfilled

  34. Streams and BigInsights - Integrated Analytics on Data in Motion & Data at Rest Visualization of real-time and historical insights Data Integration, data mining, machine learning, statistical modeling InfoSphere Streams 1. Data Ingest Data InfoSphere BigInsights, Database & Warehouse 2. Bootstrap/Enrich Data ingest, preparation, online analysis, model validation Control flow 3. Adaptive Analytics Model

  35. Analytic Applications BI / Reporting Exploration / Visualization FunctionalApp IndustryApp Predictive Analytics Content Analytics Visualization & Discovery Application Development Systems Management Accelerators HadoopSystem Data Warehouse Stream Computing Information Integration & Governance The Platform Advantage BI / Reporting IBM Big Data Platform

  36. IBM big data • IBM big data • IBM big data THINK IBM big data • IBM big data IBM big data • IBM big data IBM big data • IBM big data • IBM big data

More Related