Download
slide1 n.
Skip this Video
Loading SlideShow in 5 Seconds..
מבוא ל BI PowerPoint Presentation
Download Presentation
מבוא ל BI

מבוא ל BI

225 Vues Download Presentation
Télécharger la présentation

מבוא ל BI

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. מבוא לBI

  2. Automated Decision-Making Framework

  3. BI(לפי ויקיפדיה) • http://he.wikipedia.org/wiki/%D7%91%D7%99%D7%A0%D7%94_%D7%A2%D7%A1%D7%A7%D7%99%D7%AA תוכן עניינים • 1 היסטוריה • 2 תהליך העבודה • 3 מחסן נתונים ו-BI • 4עיבוד אנליטי מקוון (OLAP) • 5 כריית מידע (כל שיטות הלמידה שלמדנו) • 6 בינה עסקית תפעולית • 7 שימושים עיקריים • 8 מוצרי BI

  4. היסטוריה של DSS Classical Definitions of DSS • Interactive computer-based systems, which help decision makers utilize data and models to solve unstructured problems" - Gorry and Scott-Morton, 1971 • Decision support systems couple the intellectual resources of individuals with the capabilities of the computer to improve the quality of decisions. It is a computer-based support system for management decision makers who deal with semistructured problems - Keen and Scott-Morton, 1978

  5. Types of DSS • Two major types: • Model-oriented DSS • Data-oriented DSS • Evolution of DSS into Business Intelligence • Use of DSS moved from specialist to managers, and then whomever, whenever, wherever • Enabling tools like OLAP, data warehousing, data mining, intelligent systems, delivered via Web technology have collectively led to the term “business intelligence” (BI) and “business analytics”

  6. מויקיפדיה... החל מאמצע שנות ה-2000 קיימים כלים חדשים לבינה עסקית בתפיסה הנקראת Business Intelligence 2.0 ‏(BI 2.0), המאפשרים ביצוע שאילתות על ידי עובדים על נתוני הארגון בזמן אמיתי. המושג BI 2.0 נטבע בהקבלה למושג Web 2.0משום שעיבודים מסוג זה הם בתפיסה של דפדפן בסביבת Web. כלי BI 2.0 מאפשרים דיווחים דינמיים יותר מהדיווחים הסטטיים שאפיינו כלים מדור קודם. בסיס חשוב לעיבודים מסוג זה הוא השימוש ב-SOA, שבא ביחד עם שימוש במוצרי תָ‏וְוכָ‏ה (Middleware) גמישים יותר ושימוש בתקנים להעברת מידע. Service Oriented Architecture =SOA

  7. DSS Description • DSS application A DSS program built for a specific purpose (e.g., a scheduling system for a specific company) • Business intelligence (BI) A conceptual framework for decision support. It combines architecture, databases (or data warehouses), analytical tools, and applications

  8. Business Intelligence (BI) • BI is an evolution of decision support concepts over time. • Meaning of EIS/DSS… • Then: Executive Information System • Now: Everybody’s Information System (BI) • BI systems are enhanced with additional visualizations, alerts, and performance measurement capabilities. • The term BI emerged from industry apps.

  9. The Evolution of BI Capabilities

  10. The Architecture of BI • A BI system has four major components • a data warehouse, with its source data • business analytics, a collection of tools for manipulating, mining, and analyzing the data in the data warehouse; • business performance management (BPM) for monitoring and analyzing performance • a user interface (e.g., dashboard) • בשנים האחרונות תפס הנושא של בינה עסקית מקום מרכזי במערכות המידע. הגידול הרב במידע הנצבר במערכות ממוחשבות מחייב הצגה וריכוז של נתונים רלוונטיים על מנת שלמידע תהיה משמעות. אחד הביטויים לחשיבות התחום הוא רכישת חברות בולטות המתמחות בתחום על ידי חברות תוכנה גדולות

  11. A High-Level Architecture of BI

  12. Learning Objectives • Explain data integration and the extraction, transformation, and load (ETL) processes • Describe real-time (a.k.a. right-time and/or active) data warehousing • Understand data warehouse administration and security issues

  13. Stage 1: Data Warehouse • A physical repository where relational data are specially organized to provide enterprise-wide, cleansed data in a standardized format • “The data warehouse is a collection of integrated, subject-oriented databases designed to support DSS functions, where each unit of data is non-volatile and relevant to some moment in time”

  14. DW Framework

  15. Data Integration and the Extraction, Transformation, and Load (ETL) Process Extraction, transformation, and load (ETL)

  16. Data Mart A departmental data warehouse that stores only relevant data • Dependent data mart A subset that is created directly from a data warehouse • Independent data mart A small data warehouse designed for a strategic business unit or a department

  17. OLAP vs. OLTPOnline Analytical vs. Online Transaction (Processing)

  18. OLAP Slicing Operations on a Simple Tree-Dimensional Data Cube

  19. Star vs Snowflake Schema

  20. עוד דוגמא של SNOWFLAKE

  21. כריית מידע • סיווג (שווה או לא שווה ל...) • להלוות כסף, להשקיע בתחום, לפחות סניף חדש • ניתוח אשכולות (Clustering) • כמה סוגי לקוחות יש? מה מאחד אותם? • ניתוח רגרסיה • כמה נרוויח, אופטימיזציה

  22. סוגי מידע • כריית מידע מנתונים • היותר "פשוט" • כריית מידע מטקסטים • INFORMATION RETRIEVAL • TREND ANALYSIS, SENTIMENT ANALYSIS

  23. Categories of Models

  24. Static and Dynamic Models • Static Analysis • Single snapshot of the situation • Single interval • Steady state • Dynamic Analysis • Dynamic models • Evaluate scenarios that change over time • Time dependent • Represents trends and patterns over time • More realistic: Extends static models

  25. Decision Analysis: A Few Alternatives Single Goal Situations • Decision trees • Graphical representation of relationships • Multiple criteria approach • Demonstrates complex relationships • Cumbersome, if many alternatives exists

  26. Decision Tables • Investment example • One goal: maximize the yield after one year • Yield depends on the status of the economy (the state of nature) • Solid growth • Stagnation • Inflation

  27. Investment Example: Possible Situations 1. If solid growth in the economy, bonds yield 12%; stocks 15%; time deposits 6.5% 2. If stagnation, bonds yield 6%; stocks 3%; time deposits 6.5% 3. If inflation, bonds yield 3%; stocks lose 2%; time deposits yield 6.5%

  28. Optimization via Mathematical Programming • Mathematical Programming A family of tools designed to help solve managerial problems in which the decision maker must allocate scarce resources among competing activities to optimize a measurable goal • Optimal solution: The best possible solution to a modeled problem • Linear programming (LP): A mathematical model for the optimal solution of resource allocation problems. All the relationships are linear

  29. LP Problem Characteristics 1. Limited quantity of economic resources 2. Resources are used in the production of products or services 3. Two or more ways (solutions, programs) to use the resources 4. Each activity (product or service) yields a return in terms of the goal 5. Allocation is usually restricted by constraints

  30. Linear Programming Steps • 1. Identify the … • Decision variables • Objective function • Objective function coefficients • Constraints • Capacities / Demands • 2. Represent the model • LINDO: Write mathematical formulation • EXCEL: Input data into specific cells in Excel • 3. Run the model and observe the results Line

  31. LP Example The Product-Mix Linear Programming Model • MBI Corporation • Decision: How many computers to build next month? • Two types of mainframe computers: CC7 and CC8 • Constraints: Labor limits, Materials limit, Marketing lower limitsCC7CC8RelLimitLabor (days) 300 500 <= 200,000 /mo Materials ($) 10,000 15,000 <= 8,000,000 /mo Units 1 >= 100 Units 1 >= 200 Profit ($) 8,000 12,000 Max Objective: Maximize Total Profit / Month

  32. Sensitivity, What-if, and Goal Seeking Analysis • Sensitivity • Assesses impact of change in inputs on outputs • Eliminates or reduces variables • Can be automatic or trial and error • What-if • Assesses solutions based on changes in variables or assumptions (scenario analysis) • Goal seeking • Backwards approach, starts with goal • Determines values of inputs needed to achieve goal • Example is break-even point determination

  33. Heuristic Programming • Cuts the search space • Gets satisfactory solutions more quickly and less expensively • Finds good enough feasible solutions to very complex problems • Heuristics can be • Quantitative • Qualitative (in ES) • Traveling Salesman Problem >>>

  34. Heuristic Programming - SEARCH

  35. Traveling Salesman Problem • What is it? • A traveling salesman must visit customers in several cities, visiting each city only once, across the country. Goal: Find the shortest possible route • Total number of unique routes (TNUR): TNUR = (1/2) (Number of Cities – 1)! Number of CitiesTNUR 5 12 6 60 9 20,160 20 1.22 1018

  36. When to Use Heuristics When to Use Heuristics • Inexact or limited input data • Complex reality • Reliable, exact algorithm not available • Computation time excessive • For making quick decisions Limitations of Heuristics • Cannot guarantee an optimal solution

  37. Modern Heuristic Methods • Tabu search • Intelligent search algorithm • Genetic algorithms • Survival of the fittest • Simulated annealing • Analogy to Thermodynamics

  38. Simulation • Technique for conducting experiments with a computer on a comprehensive model of the behavior of a system • Frequently used in DSS tools

  39. Major Characteristics of Simulation • Imitates reality and capture its richness • Technique for conducting experiments • Descriptive, not normative tool • Often to “solve” very complex problems Simulation is normally used only when a problem is too complex to be treated using numerical optimization techniques

  40. Advantages of Simulation • The theory is fairly straightforward • Great deal of time compression • Experiment with different alternatives • The model reflects manager’s perspective • Can handle wide variety of problem types • Can include the real complexities of problems • Produces important performance measures • Often it is the only DSS modeling tool for non-structured problems

  41. Limitations of Simulation • Cannot guarantee an optimal solution • Slow and costly construction process • Cannot transfer solutions and inferences to solve other problems (problem specific) • So easy to explain/sell to managers, may lead overlooking analytical solutions • Software may require special skills

  42. Simulation Types • Stochastic vs. Deterministic Simulation • In stochastic simulations: We use distributions (Discrete or Continuous probability distributions) • Time-dependent vs. Time-independent Simulation • Time independent stochastic simulation via Monte Carlo technique (X = A + B) • Discrete event vs. Continuous simulation • Steady State vs. Transient Simulation • Simulation Implementation • Visual simulation • Object-oriented simulation

  43. Data Mining Methods: Classification • Most frequently used DM method • Part of the machine-learning family • Employ supervised learning • Learn from past data, classify new data • The output variable is categorical (nominal or ordinal) in nature • Classification versus regression? • Classification versus clustering?

  44. Assessment Methods for Classification • Predictive accuracy • Hit rate • Speed • Model building; predicting • Robustness • Scalability • Interpretability • Transparency, explainability

  45. Accuracy of Classification Models • In classification problems, the primary source for accuracy estimation is the confusion matrix

  46. Estimation Methodologies for Classification • Simple split (or holdout or test sample estimation) • Split the data into 2 mutually exclusive sets training (~70%) and testing (30%)

  47. Estimation Methodologies for Classification • k-Fold Cross Validation (rotation estimation) • Split the data into k mutually exclusive subsets • Use each subset as testing while using the rest of the subsets as training • Repeat the experimentation for k times • Aggregate the test results for true estimation of prediction accuracy training • Other estimation methodologies • Leave-one-out, bootstrapping, jackknifing • Area under the ROC curve

  48. Classification Techniques • Decision tree analysis • Statistical analysis • Neural networks • Support vector machines • Case-based reasoning • Bayesian classifiers • Genetic algorithms • Rough sets

  49. Decision Trees • Employs the divide and conquer method • Recursively divides a training set until each division consists of examples from one class • Create a root node and assign all of the training data to it • Select the best splitting attribute • Add a branch to the root node for each value of the split. Split the data into mutually exclusive subsets along the lines of the specific split • Repeat the steps 2 and 3 for each and every leaf node until the stopping criteria is reached

  50. Decision Trees • DT algorithms mainly differ on • Splitting criteria • Which variable to split first? • What values to use to split? • How many splits to form for each node? • Stopping criteria • When to stop building the tree • Pruning (generalization method) • Pre-pruning versus post-pruning • Most popular DT algorithms include • ID3, C4.5, C5; CART; CHAID; M5