1 / 18

Data Integration for Big Data

Data Integration for Big Data. Pierre Skowronski Prague le 23.04.2013. IT is struggling with the cost of Big Data. Growing data volume is quickly consuming capacity. Need to onboard, store, & process new types of data. High expense and lack of big data skills.

justine
Télécharger la présentation

Data Integration for Big Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Integration for Big Data Pierre Skowronski Prague le 23.04.2013

  2. IT is struggling with the cost of Big Data • Growing data volume is quickly consuming capacity • Need to onboard, store, & process new types of data • High expense and lack of big data skills

  3. Prove the Value with Big Data Deliver Value Along the Way Cost: Lower Big Data Project Costs (helps self-fund big data projects) Risk: Minimize Risk of New Technologies (design once, deploy anywhere) Delivery: Innovate Faster With Big Data (onboard, discover, operationalize)

  4. Introducing the Informatica PowerCenter Big Data Edition

  5. PowerCenter Big Data EditionLower Costs Optimizeprocessing with low cost commodity hardware Traditional Grid Transactions,OLTP, OLAP EDW Documents and Emails ODS Social Media, Web Logs Increase productivity up to 5X Machine Device, Scientific MDM

  6. Hadoopcomplements Existing Infrastructureon low cost commodity hardware 7

  7. 5 x better productivity for similar performance In the worst, only 20% slower the hand-coding Mostly, equal or faster Inormatica 1 week vs hand-coding 5-6 weeks 8

  8. PowerCenter Big Data EditionMinimize Risk Quickly staff projects with trained data integration experts Design once and deploy anywhere Deploy On-Premise or in the Cloud Traditional Grid Pushdown to RDBMS or DW Appliance

  9. Graphical Processing LogicTest on Native, Deploy on Hadoop Select incomplete partial records Separate incomplete and complete partial records Partial records only Aggregate all completed and partial-completed records Sort records by Calling number Separate partial records from completed records Completed records only 10

  10. Run it simple on Hadoop Choose execution environment Press Run View hive query 11

  11. Technology Achieving Operational Efficiency With Informatica Expertise & best practices Best practices & reusability Minimaize Risk with Informatica Partners and Certified Developer Community Global Systems Integrators Informatica Developers 9,000+ trained developers • 45,000+ developers in Informatica TechNet • 3x more developers than any other vendor* People * Source: U.S. resume search on dice.com, December 2008

  12. What Are Customers Doing with Informatica and Big Data?

  13. Lower Costs of Big Data ProjectsSaved $20M + $2-3M On-going by Archiving & Optimization The Challenge Data warehouse exploding with over 200TB of data. User activity generating up to 5 million queries a day impacting query performance The Solution The Result Business Reports • Saved 100TBs of space over past 2 ½ years • Reduced rearchitecture project from 6 months to 2 weeks • Improved performance by 25% • Return on investment in less than 6 months ERP EDW CRM Custom Archived Data Phase 2 Interaction Data Large Global Financial Institution

  14. Large Global Financial InstitutionLower Costs of Big Data Projects The Challenge. Increasing demand for faster data driven decision making and analytics as data volumes and processing loads rapidly increase The Solution The Result • Cost-effectively scale performance • Lower hardware costs • Increased agility by standardizing on one data integration platform Near Real-Time Datamarts RDBMS Datamarts RDBMS Traditional Grid Phase 2 RDBMS Data Warehouse Phase 2 Web Logs

  15. Large Government AgencyFlexible Architecture to Support Rapidly Changing Business Needs Traditional Grid The Challenge Data volumes growing at 3-5 times over the next 2-3 years The Solution The Result • Manage data integration and load of 10+ billion records from multiple disparate data sources • Flexible data integration architecture to support changing business requirements in a heterogeneous data management environment Business Reports DW EDW RDBMS DW Data Virtualization Phase 2 Mainframe Phase 2 Unstructured Data

  16. Why PowerCenter Big Data Edition • Repeatability • Predictable, repeatable deployments and methodology • Reuse of existing assets • Apply existing integration logic to load data to/from Hadoop • Reuse existing data quality rules to validate Hadoop data • Reuse of existing skills • Enable ETL developers to leverage the power of Hadoop • Governance • Enforce and validate data security, data quality and regulatory policies • Manageability 17

More Related