1 / 106

Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

Advanced Computing Seminar Data Mining and Its Industrial Applications — Chapter 4 — Inductive Learning. Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr Knowledge and Software Engineering Lab Advanced Computing Research Centre School of Computer and Information Science

trula
Télécharger la présentation

Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Computing Seminar Data Mining and Its Industrial Applications — Chapter 4 —Inductive Learning Zhongzhi Shi, Markus Stumptner, Yalei Hao, Gerald Quirchmayr Knowledge and Software Engineering Lab Advanced Computing Research Centre School of Computer and Information Science University of South Australia ) Chap4 Inductive Learning Zhongzhi Shi

  2. Outline • Introduction • Machine learning • Version space and bias • Decision tree learning • Ripper algorithm • Summary Chap4 Inductive Learning Zhongzhi Shi

  3. Basic Concepts • Data: Store on any media with certain format • Information: Assign meaning to concrete data • knowledge: Refine from information Chap4 Inductive Learning Zhongzhi Shi

  4. Finance • Economic • Government • Post • Population • Life cycle • Pattern • Trends • Concept • Relation • Model • Association Rules • Sequence • E-commerce • Resource distribution • Trade • Business Intelligence • E-Science Why Data Mining? Knowledge Decision Making Data Rich Data, Poor Knowledge Chap4 Inductive Learning Zhongzhi Shi

  5. Data Mining vs Knowledge Discovery • Data mining • Extraction of interesting (non-trivial,implicit, previously unknown and potentially useful)patterns or knowledge from huge amount of data • Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc. Chap4 Inductive Learning Zhongzhi Shi

  6. Data Mining: A KDD Process Knowledge • Data mining—core of knowledge discovery process Pattern Evaluation Data Mining Task-relevant Data Selection Data Warehouse Data Cleaning Data Integration Databases Chap4 Inductive Learning Zhongzhi Shi

  7. Meta data management • Data access • Systems Integration Data Warehouse Process Chap4 Inductive Learning Zhongzhi Shi

  8. Mapping Rules Designed Star Schema Data Mining Approach to Data Warehouse Design Desired star schema • Attribute • Width • Type • NULL allowed • Name • Key • Numeric • Maximum • Minimum • Average • Standard deviation • Text fields • Number of spaces • Numerals used • Average length Macro Picture Chap4 Inductive Learning Zhongzhi Shi

  9. Detailed picture Chap4 Inductive Learning Zhongzhi Shi

  10. Knowledge Representation • Production system • Frame • Semantic networks • First order logic • Ontology Chap4 Inductive Learning Zhongzhi Shi

  11. Production System • Rules IF (conditions) Then (conclusions) If ( animal has wing) and (animal can fly) Then (animal is a bird) Chap4 Inductive Learning Zhongzhi Shi

  12. Production System MYCIN $<rule> = IF <antecedent> THEN <action> (ELSE <action>$ $<antecedent> = AND <condition>$ $<condition> = OR <condition> | <predicate> <associative-tripe>$ $<associative-tripe> = <attribute> <object> <value>$ $<action> = <consequent>) | <procedure>$ $<consequent> = <associative-triple> <certainty-factor>$ Chap4 Inductive Learning Zhongzhi Shi

  13. Frame Structure FRAME FRAME-NAME SLOT-NAME-1: ASPECT-11 ASPECT-VALUE-11 ASPECT-12 ASPECT-VALUE-12 ASPECT-1m AWPECT-VALUE-1m ...... ...... SOLT-NAME-n: ASPECT-n1 ASPECT VALUE-n1 ASPECT-n2 ASPECT-VAPECT-VALUE-n2 ASPECT-n1 ASPECT-VALUE-n1 Chap4 Inductive Learning Zhongzhi Shi

  14. Semantic Networks node: objects arc: relationships Chap4 Inductive Learning Zhongzhi Shi

  15. First Order Logic • Student(John) • Teacher(Markus) • Father(x,y) • Father(y,z) • Grandfather(x,z):-Father(x,y),Father(y,z) • If ( animal has wing) and (animal can fly) Then (animal is a bird) Chap4 Inductive Learning Zhongzhi Shi

  16. Ontology Semantic Web: • Ontology • OWL • Ontology schema • Description Logic Chap4 Inductive Learning Zhongzhi Shi

  17. Outline • Introduction • Machine learning • Version space and bias • Decision tree learning • Ripper algorithm • Summary Chap4 Inductive Learning Zhongzhi Shi

  18. The Essence of Learning • Learning denotes changes in the system that are adaptive in the sense that they enable the system to do the same task or tasks drawn from the same population more efficiently and more effectively the next time. [Simon 1983] • Machine learning is the study of how to make machines acquire new knowledge, new skills, and reorganize existing knowledge. Chap4 Inductive Learning Zhongzhi Shi

  19. Environment Learning Element Knowledge Base Performance Element Feedback The Essence of Learning • The environment supplies the source information to the learning system. The level and quality of the information will significantly affect the learning strategy. Chap4 Inductive Learning Zhongzhi Shi

  20. The Essence of Learning • The environment = Information source  Database  Text  Web pages  Image  Video  Space data Chap4 Inductive Learning Zhongzhi Shi

  21. The Essence of Learning • The learning element uses this information to make improvements in an explicit knowledge base, and the performance element uses the knowledge base to perform its task.  Inductive learning  Analogical Learning  Explanation Learning  Genetic algorithm  Neural network Chap4 Inductive Learning Zhongzhi Shi

  22. Paradigms for Machine Learning • The inductive paradigm The most widely studied method for symbolic learning is one of inducing a general concept description from a sequence of instances of the concept and known counterexamples of the concept. The task is to build a concept description from which all the previous positive instances can be rederived by universal instantiation but none of the previous negative instances can be rederived by the same process. • The analogical paradigm Analogical reasoning is a strategy of inference that allows the transfer of knowledge from a known area into another area with similar properties. Chap4 Inductive Learning Zhongzhi Shi

  23. Paradigms for Machine Learning • The analytic paradigm The methods attempt to formulate a generalization after analyzing few instances in terms of the systems's knowledge. Mainly deductive rather than inductive mechanisms are used for such learning. • The genetic paradigm Genetic algorithms have been inspired by a direct analogy to mutations in biological reproduction and Darwinian natural selection. In principle, genetic algorithms encode a parallel search through concept space, with each process attempting coarse-grain hill climbing. • The connectionist paradigm Connectionist learning systems, also called ``neural networks“. Connectionist learning consists of readjusting weights in a fixed-topology network via specific learning algorithms Chap4 Inductive Learning Zhongzhi Shi

  24. The Essence of Learning • The knowledge base contains predefined concepts, domain constrains heuristic rules and so on.  Knowledge representation  Knowledge consistence  Knowledge redundancy Chap4 Inductive Learning Zhongzhi Shi

  25. The Essence of Learning • The performance element. The learning element is trying to improve the action of the performance element.The performance element applies knowledge to solve problems and evaluate the learning effects. Chap4 Inductive Learning Zhongzhi Shi

  26. On Concept • The term ``concept" is an universal notion which reflects a general, abstract, and essential features. For example, ``triangle", ``animal", ``computer", all of them are concept. Horse, tiger, bird and so on are called as example of the concept ``animal". Concept contains two meanings, extension and intension. • Intension. The set of attributes which reflect the essential features of a concept is called intension. • Extension. The set of examples which satisfy the definition of a concept is called extension. Fruit Student Chap4 Inductive Learning Zhongzhi Shi

  27. Concept Description • In general, a concept can be described by the concept name, and list of the attributes and attribute-value pairs, that is, (Concept name (Attribute 1 Value1) (Attribute2 Value2) … (Attributen Valuen) In addition, concept description can be represented by first order logic. Each attribute is a predicate, concept name and attribute value can be viewed as arguments. Concept description is represented by predicate calculus Chap4 Inductive Learning Zhongzhi Shi

  28. Attribute Types • Nominal attribute is one that takes on a finite, unordered set of mutually exclusive values. • Linear attribute • Structured attribute Chap4 Inductive Learning Zhongzhi Shi

  29. Attribute Types • Nominal attribute is one that takes on a finite, unordered set of mutually exclusive values. • For examples • Color: red, green, blue • Traffic: airline, railway, ship Chap4 Inductive Learning Zhongzhi Shi

  30. Attribute Types • Linear attribute For examples • Age: 1,2,…100 • Temperature: 20, 21,… • Distance: 1km, 2km,… Chap4 Inductive Learning Zhongzhi Shi

  31. Attribute Types • Structured attribute For examples: • Tree structure • computer Hardware Software CPU Memory Computing Control Chap4 Inductive Learning Zhongzhi Shi

  32. Inductive Learning • From particular examples to general conclusion, principle, rule apple eat tomato eat banana eat … … fruit eat Chap4 Inductive Learning Zhongzhi Shi

  33. Inductive Learning • Given: • Premise statements. Consists of facts, specific observations, intermediate generalizations that provide information about some objects, phenomena, processes, and so on. • Tentative inductive assertion. Provides a priori hypothesis held about the objects in the premise statement. • Background knowledge. Contains general and domain-specific concepts for interpreting the premises and inference rules relevant to the task of inference • Find: Inductive assertion (hypothesis). It strongly or weakly implies the premise statements in the context of background knowledge and satisfies the preference criterion. Chap4 Inductive Learning Zhongzhi Shi

  34. Inductive Learning • Simplest form: learn a function from examples • f is the target function • An exampleis a pair (x, f(x)) • Problem: find a hypothesish • such that h ≈ f • given a training set of examples • (This is a highly simplified model of real learning: • Ignores prior knowledge • Assumes examples are given) Chap4 Inductive Learning Zhongzhi Shi

  35. Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi

  36. Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistentif it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi

  37. Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistent if it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi

  38. Inductive Learning Method • Construct/adjust h to agree with f on training set • (h is consistentif it agrees with f on all examples) • E.g., curve fitting: Chap4 Inductive Learning Zhongzhi Shi

  39. Best-Hypothesis • Positive example ð generalize • Negative example ð specialize • Drawbacks: check previous examples & backtrack Chap4 Inductive Learning Zhongzhi Shi

  40. Outline • Introduction • Machine learning • Version space and bias • Decision tree learning • Ripper algorithm • Summary Chap4 Inductive Learning Zhongzhi Shi

  41. Hypothesis Space • Concept description • Extension • a certain set of examples predicted to be satisfied by the hypothesis • Bias • any preference for one hypothesis over another Chap4 Inductive Learning Zhongzhi Shi

  42. Training Examples for Enjoy Sport Sky Temp Humidity Wind Water Forecast EnjoySport Sunny Warm Normal Strong Warm Same YES Sunny Warm High Strong Warm Same YES Rainy Cold High Strong Warm Change NO Sunny Warm High Strong Cool Change YES What is the general concept? Chap4 Inductive Learning Zhongzhi Shi

  43. is more_general_than_or_equal_to relation • Definition of more_general_than_or_equal_to relation: Let hj and hk be boolean-valued functions defined over X. Then hjis more_general_than_or_equal_tohk (hjg hk) iff (xX) [(hk(x)=1)(hj(x)=1)] In our case the most general hypothesis - that every day is a positive example - is represented by ?, ?, ?, ?, ?, ?, and the most specific possible hypothesis - that no day is positive example - is represented by  , , , , , . Chap4 Inductive Learning Zhongzhi Shi

  44. Example of the Ordering of Hypotheses Chap4 Inductive Learning Zhongzhi Shi

  45. Version Space Search Chap4 Inductive Learning Zhongzhi Shi

  46. Version Space Example Chap4 Inductive Learning Zhongzhi Shi

  47. Representing Version Space • The General boundary, G, of version space VSH,E, is the set of its maximally general members • The Specific boundary, S, of version space VSH,E, is the set of its maximally specific members • Every member of the version space lies between these boundaries VSH,E, = {hH | (sS) (gG) (ghs)} where xy means x is more general or equal to y Chap4 Inductive Learning Zhongzhi Shi

  48. Candidate-elimination algorithm 1 Initilize H to be the whole space. Thus, the G set contains only the null description, and the S set is consistent with the first observed positive training instance. 2. For each subsequent instance, i, BEGIN IF i is a positive instance, THEN BEGIN Retain in G only those generalizations which match I. Update S to generalize the elements in S as little as possible, so that they will match i. Chap4 Inductive Learning Zhongzhi Shi

  49. Candidate-elimination algorithm ELSE IF i is a negative instance, THEN BEGIN Retain in S only those generalizations which do not match I. Update G to specialize the elements in G as little as possible, so that they will not match i. 3 Repeat step 2 until G = S and this is a singleton set. When this occurs, H has collapsed to include only a single concept. 4 Output H. Chap4 Inductive Learning Zhongzhi Shi

  50. Converging Boundaries of the G and S sets Chap4 Inductive Learning Zhongzhi Shi

More Related