280 likes | 459 Vues
Lecture 45. Course Review and Future Research Directions. Friday, May 5, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapters 1-10, 13, Mitchell Chapters 14-21, Russell and Norvig.
E N D
Lecture 45 Course Review and Future Research Directions Friday, May 5, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapters 1-10, 13, Mitchell Chapters 14-21, Russell and Norvig
Main ThemesArtificial Intelligence and KDD • Analytical Learning: Combining Symbolic and Numerical AI • Inductive learning • Role of knowledge and deduction in integrated inductive and analytical learning • Artificial Neural Networks (ANNs) for KDD • Common neural representations: current limitations • Incorporating knowledge into ANN learning • Uncertain Reasoning in Decision Support • Probabilistic knowledge representation • Bayesian knowledge and data engineering (KDE): elicitation, causality • Data mining: KDD applications • Role of causality and explanations in KDD • Framework for data mining: wrappers for performance enhancement • Genetic Algorithms (GAs) for KDD • Evolutionary algorithms (GAs, GP) as optimization wrappers • Introduction to classifier systems
Class 0:A Brief Overview of Machine Learning • Overview: Topics, Applications, Motivation • Learning = Improving with Experience at Some Task • Improve over task T, • with respect to performance measure P, • based on experience E. • Brief Tour of Machine Learning • A case study • A taxonomy of learning • Intelligent systems engineering: specification of learning problems • Issues in Machine Learning • Design choices • The performance element: intelligent systems • Some Applications of Learning • Database mining, reasoning (inference/decision support), acting • Industrial usage of intelligent systems
Class 1:Integrating Analytical and Inductive Learning • Learning Specification (Inductive, Analytical) • Instances X, target function (concept) c: X H, hypothesis space H • Training examples D: positive, negative examples of target function c • Analytical learning: also given domain theory T for explaining examples • Domain Theories • Expressed in formal language: propositional logic, predicate logic • Set of assertions (e.g., well-formed formulae) for reasoning about domain • Expresses constraints over relations (predicates) within model • Example: Ancestor (x, y) Parent (x, z) Ancestor (z, y). • Determine • Hypothesis h H such that h(x) = c(x) for all x D • Such h are consistent with training data and domain theory T • Integration Approaches • Explanation (proof and derivation)-based learning: EBL • Pseudo-experience: incorporating knowledge of environment, actuators • Top-down decomposition: programmatic (procedural) knowledge, advice
Classes 2-3:Explanation-Based Neural Networks • Paper • Topic: Explanation-Based and Inductive Learning in ANNs • Title: Integrating Inductive Neural Network Learning and EBL • Authors: Thrun and Mitchell • Presenter: William Hsu • Key Strengths • Idea: (state, action)-to-state mappings as steps in generalizable proof (explanation) for observed episode • Generalizable approach (significant for RL, other learning-to-predict inducers) • Key Weaknesses • Other numerical learning models (HMMs, DBNs) may be more suited to EBG • Tradeoff: domain theory of EBNN lacks semantic clarity of symbolic EBL • Future Research Issues • How to get the best of both worlds (clear DT, ability to generate explanations)? • Applications: to explanation in commercial, military, legal decision support • See work by: Thrun, Mitchell, Shavlik, Towell, Pearl, Heckerman
Classes 4-5:Phantom Induction • Paper • Topic: Distal Supervised Learning and Phantom Induction • Title: Iterated Phantom Induction: a Little Knowledge Can Go a Long Way • Authors: Brodie and Dejong • Presenter: Steve Gustafson • Key Strengths • Idea: apply knowledge to generate (pseudo-experiential) training data • Speedup – learning curve significantly shortened with respect to RL by application of “small amount” of knowledge • Key Weaknesses • Haven’t yet seen how to produce plausible, comprehensible explanations • How much knowledge is “a small amount”? (How to measure?) • Future Research Issues • Control, planning domains similar (but not identical) to robot games • Applications: adaptive (e.g., ANN, BBN, MDP, GA) agent control, planning • See work by: Brodie, Dejong, Rumelhart, McClelland, Sutton, Barto
Classes 6-7:Top-Down Hybrid Learning • Paper • Topic: Learning with Prior Knowledge • Title: A Divide-and-Conquer Approach to Learning from Prior Knowledge • Authors: Chown and Dietterich • Presenter: Aiming Wu • Key Strengths • Idea: apply programmatic (procedural) knowledge to select training data • Uses simulation to boost inductive learning performance (cf. model checking) • Divide-and-conquer approach (multiple experts) • Key Weaknesses • Doesn’t illustrate form, structure of programmatic knowledge clearly • Doesn’t systematize and formalize model checking / simulation approach • Future Research Issues • Model checking and simulation-driven hybrid learning • Applications: “consensus under uncertainty”, simulation-based optimization • See work by: Dietterich, Frawley, Mitchell, Darwiche, Pearl
Classes 8-9:Learning Using Prior Knowledge • Paper • Topic: Refinement of Approximate Domain-Theoretic Knowledge • Title: Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks • Authors: Towell, Shavlik, and Noordewier • Presenter: Li-Jun Wang • Key Strengths • Idea: build relational explanations; compile into ANN representation • Applies structural, functional, constraint-based knowledge • Uses ANN to further refine domain theory • Key Weaknesses • Can’t get refined domain theory back! • Explanations also no longer clear after “compilation” (transformation) process • Future Research Issues • How to retain semantic clarity of explanations, DT, knowledge representation • Applications: intelligent filters (e.g., fraud detection), decision support • See work by: Shavlik, Towell, Maclin, Sun, Schwalb, Heckerman
Class 10:Introduction to Artificial Neural Networks • Architectures • Nonlinear transfer functions • Multi-layer networks of nonlinear units (sigmoid, hyperbolic tangent) • Hidden layer representations • Backpropagation of Error • The backpropagation algorithm • Relation to error gradient function for nonlinear units • Derivation of training rule for feedfoward multi-layer networks • Training issues: local optima, overfitting • References: Chapter 4, Mitchell; Chapter 4, Bishop; Rumelhart et al • Research Issues: How to… • Learn from observation, rewards and penalties, and advice • Distribute rewards and penalties through learning model, over time • Generate pseudo-experiential training instances in pattern recognition • Partition learning problems on the fly, via (mixture) parameter estimation
Classes 11-12:Reinforcement Learning and Advice • Paper • Topic: Knowledge and Reinforcement Learning in Intelligent Agents • Title: Incorporating Advice into Agents that Learn from Reinforcements • Authors: Maclin and Shavlik • Presenter: Kiranmai Nandivada • Key Strengths • Idea: compile advice into ANN representation for RL • Advice expressed in terms of constraint-based knowledge • Like KBANN, achieves knowledge refinement through ANN training • Key Weaknesses • Like KBANN, lose semantic clarity of advice, policy, explanations • How to evaluate “refinement” effectively? Quantitatively? Logically? • Future Research Issues • How to retain semantic clarity of explanations, DT, knowledge representation • Applications: intelligent agents, web mining (spiders, search engines), games • See work by: Shavlik, Maclin, Stone, Veloso, Sun, Sutton, Pearl, Kuipers
Classes 13-14:Reinforcement Learning Over Time • Paper • Topic: Temporal-Difference Reinforcement Learning • Title: TD Models: Modeling the World at a Mixture of Time Scales • Author: Sutton • Presenter: Vrushali Koranne • Key Strengths • Idea: combine state-action evaluation function (Q) estimates over multiple time steps of lookahead • Effective temporal credit assignment (TCA) • Biologically plausible (simulates TCA aspects of dopaminergic system) • Key Weaknesses • TCA methodology is effective but semantically hard to comprehend • Slow convergence: can knowledge help? How will we judge? • Future Research Issues • How to retain clarity, improve convergence speed, of multi-time RL models • Applications: control systems, robotics, game playing • See work by: Sutton, Barto, Mitchell, Kaelbling, Smyth, Shafer, Goldberg
Classes 15-16:Generative Neural Models • Paper • Topic: Pattern Recognition using Unsupervised ANNs • Title: The Wake-Sleep Algorithm for Unsupervised Neural Networks • Authors: Hinton, Dayan, Frey, and Neal • Presenter: Prasanna Jayaraman • Key Strengths • Idea: use 2-phase algorithm to generate training instances (“dream” stage) and maximize conditional probability of data given model (“wake” stage) • Compare: expectation-maximization (EM) algorithm • Good for image recognition • Key Weaknesses • Not all data admits this approach (small samples, ill-defined features) • Not immediately clear how to use for problem-solving performance elements • Future Research Issues • Studying information theoretic properties of Helmholtz machine • Applications: image/speech/signal recognition, document categorization • See work by: Hinton, Dayan, Frey, Neal, Kirkpatrick, Hajek, Gharahmani
Classes 17-18:Modularity in Neural Systems • Paper • Topic: Combining Models using Modular ANNs • Title: Modular and Hierarchical Learning Systems • Authors: Jordan and Jacobs • Presenter: Afrand Agah • Key Strengths • Idea: use interleaved EM update steps to update expert, gating components • Effect: forces specialization among ANN components (GLIMs); boosts performance of single experts; very fast convergence in some cases • Explores modularity in neural systems (artificial and biological) • Key Weaknesses • Often cannot achieve higher accuracy than ML, MAP, Bayes optimal estimation • Doesn’t provide experts that specialize in spatial, temporal pattern recognition • Future Research Issues • Constructing, selecting mixtures of other ANN components (not just GLIMs) • Applications: pattern recognition, time series prediction • See work by: Jordan, Jacobs, Nowlan, Hinton, Barto, Jaakola, Hsu
Class 19:Introduction to Probabilistic Reasoning • Architectures • Bayesian (Belief) Networks • Tree structured, polytrees • General • Decision networks • Temporal variants (beyond scope of this course) • Parameter Estimation • Maximum likelihood (MLE), maximum a posteriori (MAP) • Bayes optimal classification, Bayesian learning • References: Chapter 6, Mitchell; Chapters 14-15, 19, Russell and Norvig • Research Issues: How to… • Learn from observation, rewards and penalties, and advice • Distribute rewards and penalties through learning model, over time • Generate pseudo-experiential training instances in pattern recognition • Partition learning problems on the fly, via (mixture) parameter estimation
Classes 20-21:Approaches to Uncertain Reasoning • Paper • Topic: The Case for Probability • Title: In Defense of Probability • Author: Cheeseman • Presenter: Pallavi Paranjape • Key Strengths • Idea: probability is mathematically sound way to represent uncertainty • Views of probability considered: objectivist, frequentist, logicist, subjectivist • Argument made for meta-subjectivist belief measure concept of probability • Key Weaknesses • Highly dogmatic view without concrete justification for all assertions • Does not quantitatively, empirically compare Bayesian, non-Bayesian methods • Future Research Issues • Integrating symbolic and numerical (statistical) models of uncertainty • Applications: uncertain reasoning, pattern recognition, learning • See work by: Cheeseman, Cox, Good, Pearl, Zadeh, Dempster, Shafer
Classes 22-23:Learning Bayesian Network Structure • Paper • Topic: Learning Bayesian Networks from Data • Title: Learning Bayesian Network Structure from Massive Datasets • Authors: Friedman, Pe'er, Nachman • Presenter: Jincheng Gao • Key Strengths • Idea: can use graph constraints, scoring functions to select candidate parents in constructing directed graph model of probability (BBN) • Tabu search, greedy score-based methods (K2), etc. also considered • Key Weaknesses • Optimal Bayesian network structure learning still intractable for conventional (single-instruction sequential) architectures • More empirical comparison among alternative methods warranted • Future Research Issues • Scaling up to massive real-world data sets (e.g., medical, agricultural, DSS) • Applications: diagnosis, troubleshooting, user modeling, intelligent HCI • See work by: Friedman, Goldszmidt, Heckerman, Cooper, Beinlich, Koller
Classes 24-25:Bayesian Networks for User Modeling • Paper • Topic: Decision Support Systems and Bayesian User Modeling • Title: The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users • Authors: Horvitz, Breese, Heckerman, Hovel, Rommelse • Presenter: Yuhui (Cathy) Liu • Key Strengths • Idea: BBN model is developed from user logs, used to infer mode of usage • Can infer goals, skill level of user • Key Weaknesses • Need high accuracy in inferring goals to deliver meaningful content • May be better to use next-generation search engine (more interactivity, less passive monitoring) • Future Research Issues • Designing better interactive user modeling • Applications: clickstream monitoring, e-commerce, web search, help • See work by: Horvitz, Breese, Heckerman, Lee, Huang
Classes 26-27:Causal Reasoning • Paper • Topic: KDD and Causal Reasoning • Title: Symbolic Causal Networks for Reasoning about Actions and Plans • Authors: Darwiche and Pearl • Presenter: Yue Jiao • Key Strengths • Idea: use BBN to represent symbolic constraint knowledge • Can use to generate mechanistic explanations • Model actions • Model sequences of actions (plans) • Key Weaknesses • Integrative methods (numerical, symbolic BBNs) still need exploration • Unclear how to incorporate methods for learning to plan • Future Research Issues • Reasoning about systems • Applications: uncertain reasoning, pattern recognition, learning • See work by: Horvitz, Breese, Heckerman, Lee, Huang
Classes 28-29:Knowledge Discovery from Scientific Data • Paper • Topic: KDD for Scientific Data Analysis • Title: KDD for Science Data Analysis: Issues and Examples • Authors: Fayyad, Haussler, and Stolorz • Presenter: Arulkumar Elumalai • Key Strengths • Idea: investigate how and whether KDD techniques (OLAP, learning) scale up to huge data sets • Answer: “it depends” – on computational complexity, many other factors • Key Weaknesses • Haven’t developed clear theory yet of how to assess “how much data is really needed” • No technical treatment or characterization of data cleaning • Future Research Issues • Data cleaning (aka data cleansing), pre- and post-processing (OLAP) • Applications: intelligent databases, visualization, high-performance CSE • See work by: Fayyad, Smyth, Uthurusamy, Haussler, Foster
Classes 30-31:Relevance Determination • Paper • Topic: Relevance Determination in KDD • Title: Irrelevant Features and the Subset Selection Problem • Authors: John, Kohavi, and Pfleger • Presenter: DingBing Yang • Key Strengths • Idea: cast problem of choosing relevant attributes (given “top-level” learning problem specification) as search • Effective state space search (A/A*-based) approach demonstrated • Key Weaknesses • May not have good enough heuristics! • Can either develop them (via information theory) or use MCMC methods • Future Research Issues • Selecting relevant data channels from continuous sources (e.g., sensors) • Applications: bioinformatics (genomics, proteomics, etc.), prognostics • See work by: Kohavi, John, Rendell, Donoho, Hsu, Provost
Classes 32-33:Learning for Text Document Categorization • Paper • Topic: Text Documents and Information Retrieval (IR) • Title: Hierarchically Classifying Documents using Very Few Words • Authors: Koller and Sahami • Presenter: Yan Song • Key Strengths • Idea: use rank-frequency scoring methods to find “keywords that make a difference” • Break into meaningful hierarchy • Key Weaknesses • Sometimes need to derive semantically meaningful cluster labels • How to integrate this method with dynamic cluster segmentation, labeling? • Future Research Issues • Bayesian architectures using “non-Bayesian” learning algorithms (e.g., GAs) • Applications: digital libraries (hierarchical, distributed dynamic indexing), intelligent search engines, intelligent displays (and help indices) • See work by: Koller, Sahami, Roth, Charniak, Brill, Yarowsky
Classes 34-35:Web Mining • Paper • Topic: KDD and The Web • Title: Learning to Extract Symbolic Knowledge from the World Wide Web • Authors: Craven, DiPasquo, Freitag, McCallum, Mitchell, Nigam, and Slattery • Presenter: Ping Zou • Key Strengths • Idea: build probabilistic model of web documents using “keywords that matter” • Use probabilistic model to represent knowledge for indexing into web database • Key Weaknesses • How to account for concept drift? • How to explain and express constraints (e.g., “proper nouns that are person names don’t matter”)? Not considered here… • Future Research Issues • Using natural language processing (NLP), image / audio / signal processing • Applications: searchable hypermedia, digital libraries, spiders, other agents • See work by: McCallum, Mitchell, Roth, Sahami, Pratt, Lee
Class 36:Introduction to Evolutionary Computation • Architectures • Genetic algorithms (GAs), genetic programming (GP), genetic wrappers • Simple vs. parameterless GAs • Issues • Loss of diversity • Consequence: collapse of Pareto front • Solutions: niching (sharing, preselection, crowding) • Parameterless GAs • Other issues (not covered): genetic drift, population sizing, etc. • References: Chapter 9, Mitchell; Chapters 1-6, Goldberg; Chapter 1-5, Koza • Research Issues: How to… • Design GAs based on credit assignment system (in performance element) • Build hybrid analytical / inductive learning GP systems • Use GAs to perform relevance determination in KDD • Control diversity in GA solutions for hard optimization problems
Class 37-38:Genetic Algorithms and Classifier Systems • Paper • Topic: Classifier Systems and Inductive Learning • Title: Generalization in the XCS Classifier System • Author: Wilson • Presenter: Elizabeth Loza-Garay • Key Strengths • Idea: incorporate performance element (classifier system) into GA design • Solid theoretical foundation: advanced building block (aka schema) theory • Can use to engineer more efficient GA model, tune parameters • Key Weaknesses • Need to progress from toy problems (e.g., MUX learning) to real-world ones • Need to investigate scaling up of GA principles (e.g., building block mixing) • Future Research Issues • Building block scalability in classifier systems • Applications: reinforcement learning, mobile robotics, other animats, a-life • See work by: Wilson, Goldberg, Holland, Booker
Class 39-40:Knowledge-Based Genetic Programming • Paper • Topic: Genetic Programming and Multistrategy Learning • Title: Genetic Programming and Deductive-Inductive Learning: A Multistrategy Approach • Authors: Aler, Borrajo, and Isasi • Presenter: Yuhong Cheng • Key Strengths • Idea: use knowledge-based system to calibrate starting state of MCMC optimization system (here, GP) • Can incorporate knowledge (as in CIS830 Part 1 of 5) • Key Weaknesses • Generalizability of HAMLET population seeding method not well established • “General-purpose” problem solving systems can become Rube Goldberg-ian • Future Research Issues • Using multistrategy GP systems to provide knowledge-based decision support • Applications: logistics (military, industrial, commercial), other problem solving • See work by: Aler, Borrajo, Isasi, Carbonell, Minton, Koza, Veloso
Class 41-42:Genetic Wrappers for Inductive Learning • Paper • Topic: Genetic Wrappers for KDD Performance Enhancement • Title: Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm • Authors: Raymer, Punch, Goodman, Sanschagrin, Kuhn • Presenter: Karthik K. Krishnakumar • Key Strengths • Idea: use GA to empirically (statistically) validate inducer • Can use to select, synthesize attributes (akafeatures) • Can also use to tune other GA parameters (hence “wrapper”) • Key Weaknesses • Systematic experimental studies of genetic wrappers have not yet been done • Wrappers don’t yet take performance element into explicit account • Future Research Issues • Improving supervised learning inducers (e.g., in MLC++) • Applications: better combiners; feature subset selection, construction • See work by: Raymer, Punch, Cherkauer, Shavlik, Freitas, Hsu, Cantu-Paz
Class 43-44:Genetic Algorithms for Optimization • Paper • Topic: Genetic Optimization and Decision Support • Title: A Niched Pareto Optimal Genetic Algorithm for Multiobjective Optimization • Authors: Horn, Nafpliotis, and Goldberg • Presenter: Li Lian • Key Strengths • Idea: control representation of neighborhoods Pareto optimal front by niching • Gives abstract and concrete case studies of niching (sharing) effects • Key Weaknesses • Need systematic exploration, characterization of “sweet spot” • Shows static comparisons, not small-multiple visualizations that led to them • Future Research Issues • Biologically (ecologically) plausible models • Applications: engineering (ag / bio, civil, computational, environmental, industrial, mechanical, nuclear) optimization; computational life sciences • See work by: Goldberg, Horn, Schwefel, Punch, Minsker, Kargupta
Class 45:Meta-Summary • Data Mining / KDD Problems • Business decision support • Classification • Recommender systems • Control and policy optimization • Data Mining / KDD Solutions: Machine Learning, Inference Techniques • Models • Version space, decision tree, perceptron, winnow • ANN, BBN, SOM • Q functions • GA/GP building blocks (schemata), GP building blocks • Algorithms • Candidate elimination, ID3, delta rule, MLE, Simple (Naïve) Bayes • K2, EM, backprop, SOM convergence, LVQ, ADP, simulated annealing • Q-learning, TD() • Simple GA, GP