430 likes | 581 Vues
Pentaho business analytics & data integration Amjad.akkawi@zaponet.com. About US – Zaponet data science solutions.
E N D
Pentaho business analytics & data integration Amjad.akkawi@zaponet.com
About US – Zaponet data science solutions • Zaponet is a service integrator and development shop providing solutions & professional services for building state of the art data-products which leverage big-data & data-science technologies. • Zaponet architect, design and builds big-data solutions: data warehouses, user-profile systems, recommendation engines, complex event processing and more • Some of our technology partners are: pentaho ,cloudera ,infobright , vertica, kognitio ,gigaspaces • more details www.zaponet.com *future meetup: Pentaho Weka for data science
About Me – AmjadAkkawi • ZaponetCTO • Experience in pentaho
Agenda • Pentaho in business analytics & data integration • Pentaho BI Demo • Pentaho PDI Demo
About Pentaho • Recognized leader in business analytics & data integration • Subscription-based business model • Achieved critical mass: • Over 1,200 commercial customers • Over 10,000 production deployments • Over 185 countries • Stewardship of most important open source analytics projects OVER 160 PARTNERS GLOBALLY INDUSTRY RECOGNITION
Why Customer Love Pentaho Speed of Deployment Marketing dashboard in less than 1 day 2 weeks time to market 8 weeks time to market Fully rolled out in budget in 4 months Innovation & Scalability Analyzing buying patterns of 5 million members Music files from 20,000 sources Operational reports at all 1000 retail stores Analytics on 500,000 patients records Superior Customer Service “… a great partner through every phase of our project” “Pentaho support is as good as its software” “… better functionality and more support” “… top-notch professional support” Total Value “…ROI was almost immediate.” 75% lower acquisition costs €350K+ cost saving Less than 1 month ROI
Pentaho in the Big Data Fabric Pentaho Business Analytics 3rd Party Tools Big Analytics • R • 3rd Party BI Tools • Applications Data Integration Job Orchestration Workflow Scheduling High Performance Visual IDE Data Integration Hadoop Java MapReduce, Pig Pentaho MapReduce NoSQL Databases Analytic Databases Big Data Mgmt
High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Components are independent Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining
High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining
High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining
High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining
Enhanced In-Memory Analytics • Enhanced in-memory caching for speed of thought visualization & analysis • More re-usability of in-memory data • Fewer trips to the database/disk • Builds on existing unique extreme-scale in-memory analytics • Support for external data grids • Infinispan / JBossEnteprise Data Grid and Memcached • Scale to caching hundreds of GBs (potentially TBs) of data in-memory • Competition • Java heap or C++ memory space (a few GB at most (most BI products) or • Proprietary (hard to manage) in-memory technology (e.g. Qlikview, Microstrategy)
High Level Feature/Functions Self-service InteractiveKPI & Metrics and Visualization Information Consumers Dashboards Ad hoc and Operational Reports Business Users Reporting Knowledge Workers/ Business Users Analysis Self-service Interactiveand Ad Hoc Analysis High Performance Data Integration, BIG DATA, Cleansingand Presentation Power Users, Developers & DBAs Data Advanced Power Users & Viewers Advanced Predictive Analysis Data Mining
Scenario 1 Dashboard OperationalDatabase Report
Scenario 2 Dashboard Data Mart(s) / Warehouse Metadata Report Analyzer
Metadata – Schema Workbench Complex calculations and multi-cube requirements may need more modeling
Scenario 3 BIG DATA Technologyand/orStaging Area & Data Vault Structured Data Dashboard Data Mart(s) / Warehouse PDI PDI Metadata Report Unstructured Data100 Analyzer Pentaho Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management Pentaho Data Integration Source data acquisition Initial consolidation as required
Variations on a Theme Alerting SMS, eMail & attachments BIG DATA Technologyand/orStaging Area & Data Vault Structured Data Dashboard Data Mart(s) / Warehouse PDI PDI Metadata Report Unstructured Data Analyzer Pentaho Data Integration Cleansing Transformation Change Data Capture Data Warehouse Management Pentaho Data Integration Source data acquisition Initial consolidation as required Ad-hoc Data
PDI Components • Enterprise Edition Data Integration Server • Execution and remote monitoring • Integrated scheduling • Enterprise Security options • Enhanced content management including revision history and locking • Remote distributed cluster based processing
Pentaho Data Integration Step based processing engine with instant visualization of results
Pentaho Data Integration Step based performance
Pentaho Data Integration Integrated Metadata Creation
Pentaho and Big DataForrester Wave, Enterprise Hadoop Solutions, Q1 2012 • Only vendor in strong performer category: “an impressive Hadoop integration tool” • Only business analytics vendor • Richest functionality • Most extensive integration with open source Apache Hadoop and major Hadoop distributions
Expanded Insight into Big and Diverse Data • Improved support for Hadoop • Simpler deployment across Hadoop clusters • Support for the Hadoop cache • Debian RPM installer • Performance and ease of use enhancements for Pentaho MapReduce visual development • Support for Hadoop Security data access • New NoSQL database support • Cassandra • MongoDB • Growing the Pentaho big data community • Open sourced all big data components (Hadoop & NoSQL) • Apache License – same as used by leading Hadoop and NoSQL distros • New big data developer resources: How to documents, videos, walk-throughs
Hadoop Data Management & Integration Accessible by any ETL developer or data scientist Pentaho MapReduce
NoSQL Data Management & Integration Visual Job OrchestrationAny Data Source Accessible by any ETL developer or data scientist
Visual Job Orchestration Any Data Source Accessible to any ETL developer or data scientist Scheduling
Pentaho Integration Options PentahoBI Server Other Application Pentaho Custom Stuff My Application PentahoComponents
Q & A • NEXT … • Pentaho PDI Demo • Pentaho BI Demo
“Traditional” Database Support DATA ANALYSIS DATA INTEGRATION
Broadest Support for Big Data Platforms Hadoop NoSQL Analytic Databases