1 / 18

Elma Akand *, Mike Bain, Mark Temple *CSE, UNSW/School of Biomedical and Health Sciences,UWS

7 th December 2010. The Sixth Australasian Ontology Workshop, Adelaide University of South Australia. A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain. Elma Akand *, Mike Bain, Mark Temple

nenet
Télécharger la présentation

Elma Akand *, Mike Bain, Mark Temple *CSE, UNSW/School of Biomedical and Health Sciences,UWS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 7th December 2010 The Sixth Australasian Ontology Workshop, Adelaide University of South Australia A Visual Analytics Approach to Augmenting Formal Concepts with Relational Background Knowledge in a Biological Domain Elma Akand*, Mike Bain, Mark Temple *CSE, UNSW/School of Biomedical and Health Sciences,UWS

  2. Outline • Machine learning and data mining in bioinformatics • Domain Ontologies in biomedical applications • Formal Concept Analysis • MCW algorithm (Mining Closed itemsets for Web apps) • BioLattice – a web based browser • Experimental Application:systems biology Part-1: Concept ranking by gene interaction Part-2: Relational learning of multiple-stress rules

  3. Machine learning & Data mining in Bioinformatics • Bioinformatics “Bioinformatics is the study of information content and information flow in biological systems and processes” (Michael Liebman,1995) • Machine Learning & Data mining -Can offer automatic knowledge acquisition -Process to discover knowledge by analyzing data from different perspectives and can contribute greatly in building knowledge base • Our work: focus on knowledge-based machine learning • Previous work:learning from ontologies • Current work:ontology construction by learning • Potential application areas: ontologies– central to eCommerce, eHealth • Current application area: systemsbiology– predict gene function, data integration

  4. Ontology • In philosophy - concerned with nature and relations of being • In knowledge representation-study of categorization of things: Ontology Ontology – "specification of a conceptualization” (Gruber, 1993) Conceptualization – "formalization of knowledge in declarative form” (Genesereth and Nilsson, 1987) Informal Ontology Upper Ontology Natural language General Formal Ontology Domain Ontology First order logic or a variant Specific

  5. Gene Ontology x y a b x gene: x concepts : a ,b relations : (i) x a (ii) x b and (iii) b  a • Missing concepts and relations • One gene annotated with different GO terms with a term specialization of other

  6. Formal Concept Analysis (FCA) • Mathematical order theory (Rudolf Wille in the early 80s) -Derives conceptual structures out of data -Method for data analysis, knowledge representation and information management • Components -Formal context, concept , concept lattice

  7. ({cats, gibbons, dogs, dolphins, humans, whales}, {}) Top ({cats, gibbons, dogs}, {hair-covered}) ({gibbons, dolphins, humans, whales}, {intelligent}) 5 6 ({dolphins, whales}, {intelligent, marine}) ({cats, dogs}, {hair-covered, four-legged}) 3 ({gibbons, humans}, {intelligent, thumbed}) 2 4 1 ({gibbons}, {intelligent, hair-covered, thumbed}) Bottom ({}, {intelligent, hair-covered, thumbed, marine, four-legged}) Formal concepts in a concept lattice • Formal context: an n by m Boolean matrix m attributes A columns n objects O rows • Formal concept: Galois connection <X, Y> X is a subset of A, Y is a subset of O • Concept lattice loosely interpretable in ontology terms: concept definitions and cf. T-box sub-concept relations concept membership cf. A-box by objects

  8. FCA in data mining • FCA can be seen as a clustering technique in machine learning -Most of the work is ina propositional framework • In data mining closed itemset miningis an efficient alternative to FCA A frequent itemset X is closed if there exists no proper superset Y such that Y⊃X with support(Y)=support(X) E.g., if X = {a,b,c,d} and Y ={a,b,c,d,e} and support(Y)=support(X), then X is not closed • Parameters to avoid building entire lattice -Extent size must be greater than minsup • Existing closed itemset mining algorithms -Data structures to speed up closeditemset mining -But may not build lattice, or include extents

  9. MCW algorithm (Mining Closed itemsets for Web apps) • Vertical data format • IT-tree (itemset-tidsettree) search space -node has Xxt(X) and all children have prefix X • Pruning - 4 set difference closure operators • Subsumption check - A look-up table to record all attributes and their occurrences in closed concepts • Lattice - adding concepts following a general to specific order Is {TA}{135} closed? i(135)={TAWC}

  10. Closure operators Based on CHARM (Zaki, 2005) {TA}{135}={TW}{135} ->{TAW}{135} {D}{2456}⊂{C}{123456}->{DC}{2456} {D}{2456} and {W}{12345}->{DW}{245}

  11. Concept lattice as a visual analytics approach • Visual analytics -combination of information visualization with machine learning and data analysis (Keim et al., 2008) • Visualization of concept lattice -provides overview of the structure of the domain • means for further data analysis, e.g., classification, clustering,implicationdiscovery, rule learning • Previous work -lattice navigation since Godin etal. (1993) -Browsable concept lattice, e.g., Kim & Compton (2004) • Our current work - on augmenting concept lattice by integrating multiple sources of knowledge(Gene Ontology, protein interactions) for further analysis & machine learning

  12. Case study: Yeast systems biology

  13. Browsable concept lattice more general

  14. Biological validation (1) : synthetic lethality Synthetic lethal interaction if cell is viable when either gene A or B are individually deleted, but cannot grow when both are deleted. Our results show that 72 (119) concepts in the lattice more likely than random chance at p < 0.01 (p < 0.05) to contain synthetic lethal pairs.

  15. Biological validation (2) : ILP learning of concept definitions Biochemical pathway data Protein-protein interaction data Ontology data Inductive Logic Programming Transcription factor binding data (ChIP-chip) Microarray gene-expression data First-order rule concept(A):- ppi(B,A,C), ppi(B,A,E), ppi(B,C,E) tfbinds(D,C),fbinds(F,E)

  16. Example rule: RSM19 required for H2O2 response; RSM19, RSM22 and MRPS17 in “mitochondrial ribosomal small subunit” stable complex; and RSM22, MRPS17 bound by transcription factors under amino acid starvation. Transcription factors

  17. Conclusions • Many real-world domains are data-intensive • Machine learning and data mining applications required to generate predictive and useful outputs • We focus on knowledge-based learning for comprehensibility – use ontologies • Formal concept analysis as a framework for ontology structure • Use data mining techniques for efficient concept lattice generation • Visual analytics approach: browsable lattice, added background knowledge • Initial validation on a case study from yeast systems biology

  18. Future work • Investigate pseudo-intents to simplify concept lattice • Investigate variants of concept lattice structures • -e.g., concept lattice of inverse context • Add concept definitions to background knowledge in ILP

More Related