1 / 24

Graph and Topological Structure Mining on Scientific Articles

Graph and Topological Structure Mining on Scientific Articles. Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The Kent State University. Presenter: Fan Wang The Ohio State University. Outline. Introduction Topological Structure Mining

ormand
Télécharger la présentation

Graph and Topological Structure Mining on Scientific Articles

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph and Topological Structure Mining on Scientific Articles Fan Wang, Ruoming Jin, Gagan Agrawal and Helen Piontkivska The Ohio State University The Kent State University Presenter: Fan Wang The Ohio State University

  2. Outline • Introduction • Topological Structure Mining • Data Preprocessing and Graph Representations • Experiment Results and Pattern Analysis • Conclusion

  3. Introduction • Huge number of genes in literature • Associated with targeted disease or functionality • Finding interaction among genes manually • Time consuming • Error Prone

  4. Introduction • Well-known relationship among chemokine ligands • Mining these relations from literature documents • Mining frequent patterns from graph datasets • Convenient representation • Lots of research in subgraph mining

  5. Introduction • Our Goal • Find commonly occurring interactions • Represent them visually • Capture the co-occurrence of scientific terms • Graph representation of scientific document • Mining frequent topological structures

  6. Outline • Introduction • Topological Structure Mining • Data Preprocessing and Graph Representations • Experiment Results and Pattern Analysis • Conclusion

  7. Topological Structure Mining • Disadvantages of subgraph mining • Exact matching • Missing potential patterns • Focusing on the topological relationship • Incorporating approximate matching

  8. Topological Structure Mining G X G is a subgraph of Y Y X is a (0,3) topological structure of Y

  9. Topological Structure Mining • Definition • Given a collection of graphs, two parameters l and h, and a threshold θ. A (l,h)-topological structure whose support is greater than or equal to θis called a frequent topological structure. • Given a set of graphs, in our KDD05 paper, an algorithm TSMiner finding frequent topological structures is implemented

  10. Our Work • Using topological structure mining • Challenges • How to create graphs? • What are the keywords? • How to insert edges into graphs?

  11. Outline • Introduction • Topological Structure Mining • Data Preprocessing and Graph Representations • Experiment Results and Pattern Analysis • Conclusion

  12. Data Preprocessing and Graph Representation • One graph for each document • Nodes are keywords of interest • Edges inserted based on occurrence of the keywords • Run topological structure mining algorithm

  13. Data Preprocessing • Four dictionaries of keywords • Short Dictionary • 321 genes expressed between prostate epithelial and stromal cells • Long Dictionary • 2600 human genes found in supperarray’s DNA microarray experiment • Confusion Dictionary • Gene names easily confused with ordinary words • GO Dictionary • GO terms (molecular function, biological process and cellular component)

  14. Graph Representations • Edge Construction Methods • Sentence-based Method • Two keywords in one sentence • Mutual Information Method • The mutual information of two keywords greater than a threshold • Sliding Window Method • Two keywords located within a sliding window with a pre-defined size

  15. Outline • Introduction • Topological Structure Mining • Data Preprocessing and Graph Representations • Experiment Results and Pattern Analysis • Conclusion

  16. Experiment Results • Focusing on articles containing at least one of the 5 genes • CCL5, TF, IGF1, MYLK, IGFBP3 • Generating graph for each article • Finding frequent topological structures

  17. Three Edge Construction Methods

  18. Three Edge Construction Methods

  19. Three Edge Construction Methods

  20. Results • Sliding window method wins • Largest number of frequent patterns • Best scalability • Topological structure mining giving us more frequent patterns • Large number doesn’t mean high biological significance

  21. Pattern Analysis • ONLY be found by topological structure mining • ONLY be found by sliding window method • Restoring nodes revealing interesting patterns

  22. Outline • Introduction • Topological Structure Mining • Data Preprocessing and Graph Representations • Experiment Results and Pattern Analysis • Conclusion

  23. Conclusion • Sliding window method is the best • The most number of frequent patterns • The highest quality of frequent patterns • Topological structures found corresponding well to known relationships • Topological mining being a very valuable tool for biological researchers

  24. Three Edge Construction Methods • Interestingness of Edges • Counting the number of distinct edges • Computing the average interestingness of edges for all patterns found by using each edge construction method

More Related