Friday, May 5, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Re

Lecture 45 Course Review and Future Research Directions Friday, May 5, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapters 1-10, 13, Mitchell Chapters 14-21, Russell and Norvig

Main ThemesArtificial Intelligence and KDD • Analytical Learning: Combining Symbolic and Numerical AI • Inductive learning • Role of knowledge and deduction in integrated inductive and analytical learning • Artificial Neural Networks (ANNs) for KDD • Common neural representations: current limitations • Incorporating knowledge into ANN learning • Uncertain Reasoning in Decision Support • Probabilistic knowledge representation • Bayesian knowledge and data engineering (KDE): elicitation, causality • Data mining: KDD applications • Role of causality and explanations in KDD • Framework for data mining: wrappers for performance enhancement • Genetic Algorithms (GAs) for KDD • Evolutionary algorithms (GAs, GP) as optimization wrappers • Introduction to classifier systems

Class 0:A Brief Overview of Machine Learning • Overview: Topics, Applications, Motivation • Learning = Improving with Experience at Some Task • Improve over task T, • with respect to performance measure P, • based on experience E. • Brief Tour of Machine Learning • A case study • A taxonomy of learning • Intelligent systems engineering: specification of learning problems • Issues in Machine Learning • Design choices • The performance element: intelligent systems • Some Applications of Learning • Database mining, reasoning (inference/decision support), acting • Industrial usage of intelligent systems

Class 1:Integrating Analytical and Inductive Learning • Learning Specification (Inductive, Analytical) • Instances X, target function (concept) c: X H, hypothesis space H • Training examples D: positive, negative examples of target function c • Analytical learning: also given domain theory T for explaining examples • Domain Theories • Expressed in formal language: propositional logic, predicate logic • Set of assertions (e.g., well-formed formulae) for reasoning about domain • Expresses constraints over relations (predicates) within model • Example: Ancestor (x, y)  Parent (x, z)  Ancestor (z, y). • Determine • Hypothesis h  H such that h(x) = c(x) for all x  D • Such h are consistent with training data and domain theory T • Integration Approaches • Explanation (proof and derivation)-based learning: EBL • Pseudo-experience: incorporating knowledge of environment, actuators • Top-down decomposition: programmatic (procedural) knowledge, advice

Classes 2-3:Explanation-Based Neural Networks • Paper • Topic: Explanation-Based and Inductive Learning in ANNs • Title: Integrating Inductive Neural Network Learning and EBL • Authors: Thrun and Mitchell • Presenter: William Hsu • Key Strengths • Idea: (state, action)-to-state mappings as steps in generalizable proof (explanation) for observed episode • Generalizable approach (significant for RL, other learning-to-predict inducers) • Key Weaknesses • Other numerical learning models (HMMs, DBNs) may be more suited to EBG • Tradeoff: domain theory of EBNN lacks semantic clarity of symbolic EBL • Future Research Issues • How to get the best of both worlds (clear DT, ability to generate explanations)? • Applications: to explanation in commercial, military, legal decision support • See work by: Thrun, Mitchell, Shavlik, Towell, Pearl, Heckerman

Classes 4-5:Phantom Induction • Paper • Topic: Distal Supervised Learning and Phantom Induction • Title: Iterated Phantom Induction: a Little Knowledge Can Go a Long Way • Authors: Brodie and Dejong • Presenter: Steve Gustafson • Key Strengths • Idea: apply knowledge to generate (pseudo-experiential) training data • Speedup – learning curve significantly shortened with respect to RL by application of “small amount” of knowledge • Key Weaknesses • Haven’t yet seen how to produce plausible, comprehensible explanations • How much knowledge is “a small amount”? (How to measure?) • Future Research Issues • Control, planning domains similar (but not identical) to robot games • Applications: adaptive (e.g., ANN, BBN, MDP, GA) agent control, planning • See work by: Brodie, Dejong, Rumelhart, McClelland, Sutton, Barto

Classes 6-7:Top-Down Hybrid Learning • Paper • Topic: Learning with Prior Knowledge • Title: A Divide-and-Conquer Approach to Learning from Prior Knowledge • Authors: Chown and Dietterich • Presenter: Aiming Wu • Key Strengths • Idea: apply programmatic (procedural) knowledge to select training data • Uses simulation to boost inductive learning performance (cf. model checking) • Divide-and-conquer approach (multiple experts) • Key Weaknesses • Doesn’t illustrate form, structure of programmatic knowledge clearly • Doesn’t systematize and formalize model checking / simulation approach • Future Research Issues • Model checking and simulation-driven hybrid learning • Applications: “consensus under uncertainty”, simulation-based optimization • See work by: Dietterich, Frawley, Mitchell, Darwiche, Pearl

Classes 8-9:Learning Using Prior Knowledge • Paper • Topic: Refinement of Approximate Domain-Theoretic Knowledge • Title: Refinement of Approximate Domain Theories by Knowledge-Based Neural Networks • Authors: Towell, Shavlik, and Noordewier • Presenter: Li-Jun Wang • Key Strengths • Idea: build relational explanations; compile into ANN representation • Applies structural, functional, constraint-based knowledge • Uses ANN to further refine domain theory • Key Weaknesses • Can’t get refined domain theory back! • Explanations also no longer clear after “compilation” (transformation) process • Future Research Issues • How to retain semantic clarity of explanations, DT, knowledge representation • Applications: intelligent filters (e.g., fraud detection), decision support • See work by: Shavlik, Towell, Maclin, Sun, Schwalb, Heckerman

Class 10:Introduction to Artificial Neural Networks • Architectures • Nonlinear transfer functions • Multi-layer networks of nonlinear units (sigmoid, hyperbolic tangent) • Hidden layer representations • Backpropagation of Error • The backpropagation algorithm • Relation to error gradient function for nonlinear units • Derivation of training rule for feedfoward multi-layer networks • Training issues: local optima, overfitting • References: Chapter 4, Mitchell; Chapter 4, Bishop; Rumelhart et al • Research Issues: How to… • Learn from observation, rewards and penalties, and advice • Distribute rewards and penalties through learning model, over time • Generate pseudo-experiential training instances in pattern recognition • Partition learning problems on the fly, via (mixture) parameter estimation

Classes 11-12:Reinforcement Learning and Advice • Paper • Topic: Knowledge and Reinforcement Learning in Intelligent Agents • Title: Incorporating Advice into Agents that Learn from Reinforcements • Authors: Maclin and Shavlik • Presenter: Kiranmai Nandivada • Key Strengths • Idea: compile advice into ANN representation for RL • Advice expressed in terms of constraint-based knowledge • Like KBANN, achieves knowledge refinement through ANN training • Key Weaknesses • Like KBANN, lose semantic clarity of advice, policy, explanations • How to evaluate “refinement” effectively? Quantitatively? Logically? • Future Research Issues • How to retain semantic clarity of explanations, DT, knowledge representation • Applications: intelligent agents, web mining (spiders, search engines), games • See work by: Shavlik, Maclin, Stone, Veloso, Sun, Sutton, Pearl, Kuipers

Classes 13-14:Reinforcement Learning Over Time • Paper • Topic: Temporal-Difference Reinforcement Learning • Title: TD Models: Modeling the World at a Mixture of Time Scales • Author: Sutton • Presenter: Vrushali Koranne • Key Strengths • Idea: combine state-action evaluation function (Q) estimates over multiple time steps of lookahead • Effective temporal credit assignment (TCA) • Biologically plausible (simulates TCA aspects of dopaminergic system) • Key Weaknesses • TCA methodology is effective but semantically hard to comprehend • Slow convergence: can knowledge help? How will we judge? • Future Research Issues • How to retain clarity, improve convergence speed, of multi-time RL models • Applications: control systems, robotics, game playing • See work by: Sutton, Barto, Mitchell, Kaelbling, Smyth, Shafer, Goldberg

Classes 15-16:Generative Neural Models • Paper • Topic: Pattern Recognition using Unsupervised ANNs • Title: The Wake-Sleep Algorithm for Unsupervised Neural Networks • Authors: Hinton, Dayan, Frey, and Neal • Presenter: Prasanna Jayaraman • Key Strengths • Idea: use 2-phase algorithm to generate training instances (“dream” stage) and maximize conditional probability of data given model (“wake” stage) • Compare: expectation-maximization (EM) algorithm • Good for image recognition • Key Weaknesses • Not all data admits this approach (small samples, ill-defined features) • Not immediately clear how to use for problem-solving performance elements • Future Research Issues • Studying information theoretic properties of Helmholtz machine • Applications: image/speech/signal recognition, document categorization • See work by: Hinton, Dayan, Frey, Neal, Kirkpatrick, Hajek, Gharahmani

Classes 17-18:Modularity in Neural Systems • Paper • Topic: Combining Models using Modular ANNs • Title: Modular and Hierarchical Learning Systems • Authors: Jordan and Jacobs • Presenter: Afrand Agah • Key Strengths • Idea: use interleaved EM update steps to update expert, gating components • Effect: forces specialization among ANN components (GLIMs); boosts performance of single experts; very fast convergence in some cases • Explores modularity in neural systems (artificial and biological) • Key Weaknesses • Often cannot achieve higher accuracy than ML, MAP, Bayes optimal estimation • Doesn’t provide experts that specialize in spatial, temporal pattern recognition • Future Research Issues • Constructing, selecting mixtures of other ANN components (not just GLIMs) • Applications: pattern recognition, time series prediction • See work by: Jordan, Jacobs, Nowlan, Hinton, Barto, Jaakola, Hsu

Class 19:Introduction to Probabilistic Reasoning • Architectures • Bayesian (Belief) Networks • Tree structured, polytrees • General • Decision networks • Temporal variants (beyond scope of this course) • Parameter Estimation • Maximum likelihood (MLE), maximum a posteriori (MAP) • Bayes optimal classification, Bayesian learning • References: Chapter 6, Mitchell; Chapters 14-15, 19, Russell and Norvig • Research Issues: How to… • Learn from observation, rewards and penalties, and advice • Distribute rewards and penalties through learning model, over time • Generate pseudo-experiential training instances in pattern recognition • Partition learning problems on the fly, via (mixture) parameter estimation

Classes 20-21:Approaches to Uncertain Reasoning • Paper • Topic: The Case for Probability • Title: In Defense of Probability • Author: Cheeseman • Presenter: Pallavi Paranjape • Key Strengths • Idea: probability is mathematically sound way to represent uncertainty • Views of probability considered: objectivist, frequentist, logicist, subjectivist • Argument made for meta-subjectivist belief measure concept of probability • Key Weaknesses • Highly dogmatic view without concrete justification for all assertions • Does not quantitatively, empirically compare Bayesian, non-Bayesian methods • Future Research Issues • Integrating symbolic and numerical (statistical) models of uncertainty • Applications: uncertain reasoning, pattern recognition, learning • See work by: Cheeseman, Cox, Good, Pearl, Zadeh, Dempster, Shafer

Classes 22-23:Learning Bayesian Network Structure • Paper • Topic: Learning Bayesian Networks from Data • Title: Learning Bayesian Network Structure from Massive Datasets • Authors: Friedman, Pe'er, Nachman • Presenter: Jincheng Gao • Key Strengths • Idea: can use graph constraints, scoring functions to select candidate parents in constructing directed graph model of probability (BBN) • Tabu search, greedy score-based methods (K2), etc. also considered • Key Weaknesses • Optimal Bayesian network structure learning still intractable for conventional (single-instruction sequential) architectures • More empirical comparison among alternative methods warranted • Future Research Issues • Scaling up to massive real-world data sets (e.g., medical, agricultural, DSS) • Applications: diagnosis, troubleshooting, user modeling, intelligent HCI • See work by: Friedman, Goldszmidt, Heckerman, Cooper, Beinlich, Koller

Classes 24-25:Bayesian Networks for User Modeling • Paper • Topic: Decision Support Systems and Bayesian User Modeling • Title: The Lumiere Project: Bayesian User Modeling for Inferring the Goals and Needs of Software Users • Authors: Horvitz, Breese, Heckerman, Hovel, Rommelse • Presenter: Yuhui (Cathy) Liu • Key Strengths • Idea: BBN model is developed from user logs, used to infer mode of usage • Can infer goals, skill level of user • Key Weaknesses • Need high accuracy in inferring goals to deliver meaningful content • May be better to use next-generation search engine (more interactivity, less passive monitoring) • Future Research Issues • Designing better interactive user modeling • Applications: clickstream monitoring, e-commerce, web search, help • See work by: Horvitz, Breese, Heckerman, Lee, Huang

Classes 26-27:Causal Reasoning • Paper • Topic: KDD and Causal Reasoning • Title: Symbolic Causal Networks for Reasoning about Actions and Plans • Authors: Darwiche and Pearl • Presenter: Yue Jiao • Key Strengths • Idea: use BBN to represent symbolic constraint knowledge • Can use to generate mechanistic explanations • Model actions • Model sequences of actions (plans) • Key Weaknesses • Integrative methods (numerical, symbolic BBNs) still need exploration • Unclear how to incorporate methods for learning to plan • Future Research Issues • Reasoning about systems • Applications: uncertain reasoning, pattern recognition, learning • See work by: Horvitz, Breese, Heckerman, Lee, Huang

Classes 28-29:Knowledge Discovery from Scientific Data • Paper • Topic: KDD for Scientific Data Analysis • Title: KDD for Science Data Analysis: Issues and Examples • Authors: Fayyad, Haussler, and Stolorz • Presenter: Arulkumar Elumalai • Key Strengths • Idea: investigate how and whether KDD techniques (OLAP, learning) scale up to huge data sets • Answer: “it depends” – on computational complexity, many other factors • Key Weaknesses • Haven’t developed clear theory yet of how to assess “how much data is really needed” • No technical treatment or characterization of data cleaning • Future Research Issues • Data cleaning (aka data cleansing), pre- and post-processing (OLAP) • Applications: intelligent databases, visualization, high-performance CSE • See work by: Fayyad, Smyth, Uthurusamy, Haussler, Foster

Classes 30-31:Relevance Determination • Paper • Topic: Relevance Determination in KDD • Title: Irrelevant Features and the Subset Selection Problem • Authors: John, Kohavi, and Pfleger • Presenter: DingBing Yang • Key Strengths • Idea: cast problem of choosing relevant attributes (given “top-level” learning problem specification) as search • Effective state space search (A/A*-based) approach demonstrated • Key Weaknesses • May not have good enough heuristics! • Can either develop them (via information theory) or use MCMC methods • Future Research Issues • Selecting relevant data channels from continuous sources (e.g., sensors) • Applications: bioinformatics (genomics, proteomics, etc.), prognostics • See work by: Kohavi, John, Rendell, Donoho, Hsu, Provost

Classes 32-33:Learning for Text Document Categorization • Paper • Topic: Text Documents and Information Retrieval (IR) • Title: Hierarchically Classifying Documents using Very Few Words • Authors: Koller and Sahami • Presenter: Yan Song • Key Strengths • Idea: use rank-frequency scoring methods to find “keywords that make a difference” • Break into meaningful hierarchy • Key Weaknesses • Sometimes need to derive semantically meaningful cluster labels • How to integrate this method with dynamic cluster segmentation, labeling? • Future Research Issues • Bayesian architectures using “non-Bayesian” learning algorithms (e.g., GAs) • Applications: digital libraries (hierarchical, distributed dynamic indexing), intelligent search engines, intelligent displays (and help indices) • See work by: Koller, Sahami, Roth, Charniak, Brill, Yarowsky

Classes 34-35:Web Mining • Paper • Topic: KDD and The Web • Title: Learning to Extract Symbolic Knowledge from the World Wide Web • Authors: Craven, DiPasquo, Freitag, McCallum, Mitchell, Nigam, and Slattery • Presenter: Ping Zou • Key Strengths • Idea: build probabilistic model of web documents using “keywords that matter” • Use probabilistic model to represent knowledge for indexing into web database • Key Weaknesses • How to account for concept drift? • How to explain and express constraints (e.g., “proper nouns that are person names don’t matter”)? Not considered here… • Future Research Issues • Using natural language processing (NLP), image / audio / signal processing • Applications: searchable hypermedia, digital libraries, spiders, other agents • See work by: McCallum, Mitchell, Roth, Sahami, Pratt, Lee

Class 36:Introduction to Evolutionary Computation • Architectures • Genetic algorithms (GAs), genetic programming (GP), genetic wrappers • Simple vs. parameterless GAs • Issues • Loss of diversity • Consequence: collapse of Pareto front • Solutions: niching (sharing, preselection, crowding) • Parameterless GAs • Other issues (not covered): genetic drift, population sizing, etc. • References: Chapter 9, Mitchell; Chapters 1-6, Goldberg; Chapter 1-5, Koza • Research Issues: How to… • Design GAs based on credit assignment system (in performance element) • Build hybrid analytical / inductive learning GP systems • Use GAs to perform relevance determination in KDD • Control diversity in GA solutions for hard optimization problems

Class 37-38:Genetic Algorithms and Classifier Systems • Paper • Topic: Classifier Systems and Inductive Learning • Title: Generalization in the XCS Classifier System • Author: Wilson • Presenter: Elizabeth Loza-Garay • Key Strengths • Idea: incorporate performance element (classifier system) into GA design • Solid theoretical foundation: advanced building block (aka schema) theory • Can use to engineer more efficient GA model, tune parameters • Key Weaknesses • Need to progress from toy problems (e.g., MUX learning) to real-world ones • Need to investigate scaling up of GA principles (e.g., building block mixing) • Future Research Issues • Building block scalability in classifier systems • Applications: reinforcement learning, mobile robotics, other animats, a-life • See work by: Wilson, Goldberg, Holland, Booker

Class 39-40:Knowledge-Based Genetic Programming • Paper • Topic: Genetic Programming and Multistrategy Learning • Title: Genetic Programming and Deductive-Inductive Learning: A Multistrategy Approach • Authors: Aler, Borrajo, and Isasi • Presenter: Yuhong Cheng • Key Strengths • Idea: use knowledge-based system to calibrate starting state of MCMC optimization system (here, GP) • Can incorporate knowledge (as in CIS830 Part 1 of 5) • Key Weaknesses • Generalizability of HAMLET population seeding method not well established • “General-purpose” problem solving systems can become Rube Goldberg-ian • Future Research Issues • Using multistrategy GP systems to provide knowledge-based decision support • Applications: logistics (military, industrial, commercial), other problem solving • See work by: Aler, Borrajo, Isasi, Carbonell, Minton, Koza, Veloso

Class 41-42:Genetic Wrappers for Inductive Learning • Paper • Topic: Genetic Wrappers for KDD Performance Enhancement • Title: Simultaneous Feature Extraction and Selection Using a Masking Genetic Algorithm • Authors: Raymer, Punch, Goodman, Sanschagrin, Kuhn • Presenter: Karthik K. Krishnakumar • Key Strengths • Idea: use GA to empirically (statistically) validate inducer • Can use to select, synthesize attributes (akafeatures) • Can also use to tune other GA parameters (hence “wrapper”) • Key Weaknesses • Systematic experimental studies of genetic wrappers have not yet been done • Wrappers don’t yet take performance element into explicit account • Future Research Issues • Improving supervised learning inducers (e.g., in MLC++) • Applications: better combiners; feature subset selection, construction • See work by: Raymer, Punch, Cherkauer, Shavlik, Freitas, Hsu, Cantu-Paz

Class 43-44:Genetic Algorithms for Optimization • Paper • Topic: Genetic Optimization and Decision Support • Title: A Niched Pareto Optimal Genetic Algorithm for Multiobjective Optimization • Authors: Horn, Nafpliotis, and Goldberg • Presenter: Li Lian • Key Strengths • Idea: control representation of neighborhoods Pareto optimal front by niching • Gives abstract and concrete case studies of niching (sharing) effects • Key Weaknesses • Need systematic exploration, characterization of “sweet spot” • Shows static comparisons, not small-multiple visualizations that led to them • Future Research Issues • Biologically (ecologically) plausible models • Applications: engineering (ag / bio, civil, computational, environmental, industrial, mechanical, nuclear) optimization; computational life sciences • See work by: Goldberg, Horn, Schwefel, Punch, Minsker, Kargupta

Class 45:Meta-Summary • Data Mining / KDD Problems • Business decision support • Classification • Recommender systems • Control and policy optimization • Data Mining / KDD Solutions: Machine Learning, Inference Techniques • Models • Version space, decision tree, perceptron, winnow • ANN, BBN, SOM • Q functions • GA/GP building blocks (schemata), GP building blocks • Algorithms • Candidate elimination, ID3, delta rule, MLE, Simple (Naïve) Bayes • K2, EM, backprop, SOM convergence, LVQ, ADP, simulated annealing • Q-learning, TD() • Simple GA, GP

Friday, May 5, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Re

Friday, May 5, 2000 William H. Hsu Department of Computing and Information Sciences, KSU http://www.cis.ksu.edu/~bhsu Re

Presentation Transcript

A Correlation Framework for the CORBA Component Model

CIS 636 Introduction to Computer Graphics

Information Security Internet, Intranet, Extranet

DNA Computing Tutorial

Introduction to Fractals

Shape-Representation

Parallel Computing Explained Parallel Computing Overview

Writing Effective Test Questions for the Basic and Clinical Sciences

Discrete and Categorical Data

The Anatomy of Cloud Computing

Introduction to Biostatistics for Clinical Researchers

Affective Computing and Intelligent Interaction

Unit 7 THE LUNCHEON William Somerest Maugham

Virtual Data Management for Grid Computing

Writing Effective Test Questions for the Basic and Clinical Sciences

Computing in the

Module #4 – Information, Entropy, Thermodynamics, and Computing

Structure Programming Programming in Java

High Throughput Computing On Campus High Throughput Computing: How we got here, Where we are

William Stallings Data and Computer Communications 7 th Edition

Lecture 08 of 42

Introduction to GRID Computing