1 / 39

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data. Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science and Technology Drexel University Philadelphia, PA 19104 USA. Information Visualization.

Jimmy
Télécharger la présentation

A Comparison of Graphical Techniques for the Display of Co-Occurrence Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Comparison of Graphical Techniques for the Display of Co-Occurrence Data Jan W. Buzydlowski, Xia Lin, Howard D. White College of Information Science and Technology Drexel University Philadelphia, PA 19104 USA

  2. Information Visualization • (Data) Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way. [Cleveland, 1993] • (Information) Visualization has two aspects, structural modeling and graphic representation.[C. Chen, 1999] • data - model - display

  3. Visualization Overview • Model - Display • Co-Occurrence Model • 3 Graphical Displays • Data • Co-citation counts from the Institute for Scientific Information, Philadelphia, PA • Obtained from a 10-year Arts & Humanities Citation Index database given Drexel by ISI for research purposes

  4. Co-Occurrence Model • Examples • Derivation • Metrics

  5. Co-Occurrence Data - Example 1 • Market Basket Analysis • a shopping cart holds items purchased • e.g., milk, bread, razor blades, newspaper • Over all the sales for one day • what items are purchased together • how can we arrange the items in the store • Pampers and beer on Thursdays...

  6. Co-Occurrence Data - Example 2 • Author Co-citation Analysis (ACA) • Bibliographic data on a given article holds, e.g., • title, keywords, abstract, citations to other documents • An article might cite, e.g.: • Plato, Aristotle, Smith, Brown • Over a given set of many citing articles • Count how many times each pair of authors were cited together • Resulting co-citation count shows common intellectual interest

  7. Co-Occurrence Derivation • For a given data set (N = 4 unique terms) • Article 1: Plato, Aristotle, Smith • Article 2: Plato, Smith • Article 3: Plato, Aristotle, Smith, Brown • The following co-citations (C(4,2) = 6) are found • COMBINATIONCOUNTARTICLES • Plato and Smith 3 1, 2, 3 • Plato and Aristotle 2 1, 3 • Plato and Brown 1 3 • Aristotle and Smith 2 1, 3 • Aristotle and Brown 1 3 • Smith and Brown 1 3

  8. Co-Occurrence Measures • Raw counts • Additional information • Correlations • Replace each cell by correlation measure of each pair-wise column • Conditional Probability • Compute each cell by dividing each unique combination by total occurring

  9. Co-Occurrence Structure -Example

  10. Graphical Techniques • Three Methodologies • Multi-dimensional scaling • Self-organizing maps • Pathfinder networks

  11. MDS

  12. MDS Methodology • Given original distances (similarities) estimate coordinates that could give those distances • The computed distances should correspond to the original distances • Stress • Added dimensions

  13. SOM

  14. Self-Organizing Maps (SOMs) • Also known as Kohonen Maps • Based on Neural Networks • Related to wetware • robust techniques • If categories are known • supervised technique • backproprogating learning • If categories are sought • unsupervised technique • competitive learning

  15. SOMs • Given a 2-D grid of nodes • each node has N weights • each vector (row) has N terms • map each input vector to a node • Similar to vector quantization (VQ)

  16. SOMs Generation • nodes initially given random weights • randomly sample an input vector • row of co-occurrence matrix • with replacement • find a node closest to vector • Euclidean distance • update node weights • node weight = node weight + gain term * distance • update “neighborhood” • “cool” gain term and neighborhood • repeat…

  17. PF Nets

  18. Pathfinder Networks • Uses on graph notation • nodes = authors • edges = co-citation counts • Co-occurrence is a complete network (weighted, undirected) Plato 3 Smith 2 2 Aristotle

  19. Pathfinder Networks Generation • Pathfinder Network is generated by varying the parameters: • distance (r) • triangle inequality (q)

  20. Pathfinder Distance • Uses Minkowski metric: d = ( eir )1/r • Example • e1 = 3, e2 = 4 • r = 1 => d: 7 = 3 + 4 : • Driving distance / ratio data • r = 2 => d: 5 = (9 + 16)1/2 • Euclidean Distance • r (approaches) infinity => d: 4 = max( 3, 4) • ordinal data • rank rather than value

  21. Pathfinder Triangle Inequality • A required property of a metric definition d(i,j) < d(i,k) + d(k,j) • But may not be justified • in personal judgments • If a is similar to b, and b is similar to c, there may be no transitive judgment of similarity from a to c • in set intersections • Even though Smith and Jones appear 12 times, and Jones and Brown appear 5 times, the overlap between Smith and Brown cannot be predicted

  22. Pathfinder Triangle Inequality • Defines q-triangular • check paths of length q to determine if inequality is met • minimum is 2 • maximum is n -1 • full compliance • the longer the length, the fewer the connections

  23. Pathfinder Example

  24. Pathfinder Network Creation • PFNet (r, q) • Examine all paths of length q or less. • Use Minkowski Metric with parameter r to compute path length. • If a path of less weight is found, then remove the edge.

  25. Pathfinder - Example Smith 5 Jones q = 2 4 3 Brown r = 1 => Smith - Jones is kept r = 2 => Smith - Jones is kept r = infinity => Smith - Jones is removed

  26. Comparison of Techniques • MDS • Reduces dimensions / reveals clusters • 2D may be insufficient • measurement may not be Euclidean • SOM • robust • no guarantee of convergence/unique solution • Pathfinder • does not assume ratio data/triangle inequality • connections rather than position is important • additional methodology needed for display

  27. Comparison of Techniques • Similarities • Spatial models • Differences • use of visual space • semantic meaning • as related to data • research in progress

  28. Graphical Display of Methodologies • MDS • assume that 2 dimensions are sufficient • x, y for each point already defined • SOM • grid defines the 2D surface • plot each label with the appropriate node • Pathfinder • only defines the nodes and links • need additional methodologies • Spring-embedder models • Kamada and Kawai (1989) • Fruchterman and Reingold (1991) • Davidson and Harel (1996)

  29. Graphical Comparison of Three Methods • Data • Institute for Scientific Information • Arts and Humanities Database (AHCI) • 1988 - 1997 • 1.26 million records • Example: • Given Plato, find related authors • Interface described in IV 2000 Paper • CSNA 2000 Paper • (Lin, Buzydlowski, White)

  30. PLATO (4928) ARISTOTLE (1861) PLUTARCH (838) CICERO (699) HOMER (627) BIBLE (552) EURIPIDES (515) ARISTOPHANES (474) XENOPHON (459) AUGUSTINE (432) HERODOTUS (425) KANT-I (385) AESCHYLUS (374) SOPHOCLES (363) THUCYDIDES (363) OVID (334) HESIOD (325) DIOGENES-LAERTIUS (317) HEIDEGGER-M (312) DERRIDA-J (304) PINDAR (292) NIETZSCHE-F (278) HEGEL-GWF (264) VERGIL (259) AQUINAS-T (255) 25 Authors Co-cited with Plato

  31. 300 Pair-wise co-citations • 1:PLATO AND ARISTOTLE -1940 docs • 2: PLATO AND PLUTARCH - 872 docs . . . • 300: VERGIL AND AQUINAS-T - 38 docs

  32. Visualization allows for the revelation of intricate structure which cannot be absorbed in any other way...

  33. 2D MDS map of 25 authors co-cited with Plato

  34. PFNet of 25 authors co-cited with Plato AESCHYLUS SOPHOCLES EURIPIDES HESIOD AUGUSTINE HOMER PINDAR BIBLE ARISTOPHANES PLATO DIOGENES-LAERTIUS ARISTOTLE XENOPHON KANT-I CICERO AQUINAS-T PLUTARCH HEIDEGGER-M THUCYDIDES DERRIDA-J HEGEL-GWF HERODOTUS OVID NIETZSCHE-F VERGIL

  35. Conclusion • Slides available at: • faculty.cis.drexel.edu/~jbuzydlo/ • janb@drexel.edu

  36. Bibliography • Chen, Chaomei, Information Visualization and Virtual Environments, 1999. • Cleveland, William S., Visualizing Data, Hobart Press, 1993. • Davidson, R, Harel, D, Drawing Graphs Nicely Using Simulated Annealing, ACM Transactions on Graphics, 15(4): 301-31 (1996). • Fruchterman,TMJ, Reingold, EM, Graph Drawing by Force-Directed Placement, Software Practice and Experience, 21: 1129-64 (1991). • Kamada, T,Kawai, S, An Algorithm for Drawing General Undirected Graphs, Information Processing Letters, 31(1): 7-15, (1989).

More Related