1 / 38

Lattice Representation of Data

Lattice Representation of Data. Dr. Alex Pogel Physical Science Laboratory New Mexico State University. Basic Idea. Replace tabular representation by lattice representation in order to reveal hierarchical structure Basic definitions Information in the lattice

giacomo
Télécharger la présentation

Lattice Representation of Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lattice Representation of Data Dr. Alex Pogel Physical Science Laboratory New Mexico State University

  2. Basic Idea Replace tabular representation by lattice representation in order to reveal hierarchical structure • Basic definitions • Information in the lattice • Carving up epidemiological data Ganter & Wille: Formal Concept Analysis (FCA) Barwise & Seligman: Information Flow

  3. Input data Base data structure is a {0,1}-table • A set G of objects (represented by rows) and • A set M of attributes (represented by columns) • an entry of 1 indicates object g has attribute m M { G

  4. Input data, mathematically Mathematically speaking: a binary relation I from G to M, a subset of G x M interpreted as an indication of which objects g have which attributes m Via (g,m) e I

  5. Key Definitions The notion of “formal concept” is based on natural mappings that arise from the binary relation I [interpret G and M as before]: • to each subset H of G, we associate the set a(A) of all attributes the objects in H satisfy in common a: P(G)P(M) • to each subset N of M, we associate the set o(N) of all objects satisfying every attribute in N o: P(M)P(G)

  6. Key Definitions The attribute subsets N of M such that a(o(N)) = N are called formal concepts in FCA And are called closed sets in mathematics, as a(o(–)) is a closure operator on M A formal concept can be identified geometrically within a data table by reshuffling rows and columns such that • object-attribute relations are maintained and • a maximal rectangle of 1s appears.

  7. Animal Context

  8. Shuffling Reveals a Concept

  9. BIRD is the (formal) concept

  10. Closure System Arises Taking all closed sets together we obtain a closure system [aka a topped intersection structure, in Davey-Priestley] which is always a complete lattice [an ordered set for which every subset has both a supremum and infimum in the set] Examples: • Rwith <=, • P(S) with inclusion, • any topology with inclusion,…

  11. Focus on attribute logic

  12. Full list: difficult, redundant all implications that hold for the data, with up to three attributes in their premise; 125 with positive support

  13. Duquenne-Guigues Basis 20 implications generate the full list, and serve as a basis (analogy with linear algebra); ordered by support value

  14. Full list, basis, and original data

  15. Implication Reads Upwards at top right: warm-blooded implies airbreather 1st in basis: high support indicated in lime green

  16. A Subinterval of the lattice fourlegged implies airbreather pet implies warm-blooded (iguana?) and fur implies fourlegged and warm-blooded (platypus?)

  17. Original data preserved animals 26 and 27 share the attributes “lives in water”, “is warm-blooded” and “is an airbreather”

  18. Original data preserved animals 26 and 27 share the attributes “lives in water”, “is warm-blooded” and “is an airbreather”

  19. Color-coded support the similarity in color between “livestock” and the concept node below it yields the association rule livestock implies fur with 79% confidence And 11% support (bottom)

  20. Visual Vocabulary Small subdiagrams (Specifically meet-subsemilattices) can be recognized as complex sentences

  21. 3 unordered attribute concepts b c a Note: the top element is really irrelevant, but adding it makes everything we’ll look at a lattice instead of just a meet semilattice (definition: an ordered structure closed under finite meet (glb))

  22. Here’s the best known outcome No non-trivial implications b c a

  23. W over V: a & c b b c a

  24. Diamond in diamond Under condition c, a and b are equivalent b a c

  25. Convergence any two imply the third b c a

  26. Two Complex Sentences So, we can read that For nocturnal animals and pets, the attributes fourlegged and warm-blooded are equivalent, and the only implication between the attributes “nocturnal,” “fur” and “pet” is pet and nocturnal implies fur.

  27. The Hague, Netherlands

  28. Before Freese improvement

  29. After Freese improvement

  30. Apparent Splits

  31. Eliminating Light Smokers

  32. Why no object names?

  33. Lung Cancer and Smoking nearly half of these 30+ year smokers have lung cancer

  34. Bird-keeping and Smoking Association rules involving bird-keeping and smoking

  35. Limitations as KDD Process • Needs attention given to data preparation • Need more built-in verification of discovered rules • No domain-specific constructions (advantage ?) • Does not scale without clustering (universal ?)

  36. Lung Cancer No Lung Cancer BirdKeep Yes 33 34 BirdKeep No 16 64 Epidemiological functions Plan to add odds ratio calculation, via click OR = 3.9

  37. Clustering for too large lattices

  38. Support for improvement Traditional diagram improvement algorithms are based solely upon the order structure We are now moving towards the inclusion of support values in these algorithms I will talk about this topic in detail in July, here at DIMACS, as part of the Applications of Lattice Theory workshop END

More Related