1 / 16

William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

Bioinformatics and Machine Learning: Building Probabilistic Models of Gene Expression from Microarray Data. William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton Department of Computing and Information Sciences Kansas State University

Télécharger la présentation

William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics and Machine Learning:Building Probabilistic Modelsof Gene Expression from Microarray Data William H. Hsu with Haipeng Guo, Rengakrishnan Subramanian, Ben Perry, and Julie A. Thornton Department of Computing and Information Sciences Kansas State University Laboratory for Knowledge Discovery in Databases http://www.kddresearch.org/Groups/Bioinformatics

  2. Overview • Computer Science: What We Do • Software: operating systems, programming languages, software engineering, databases • Hardware: logic design, organization and architecture • Theory of Computation: algorithms, complexity, languages • Artificial Intelligence (AI): learning, reasoning, planning, agents • Computer Graphics, Geometry, and Vision • Computational Science and Engineering (CSE) • Artificial Intelligence (AI) – Fields of Study • Areas: learning, planning, vision, robotics • Applications in science, engineering, business, and defense • Computer Graphics – Some Current Projects and Fun Stuff • Computer-Aided Design (CAD) and Engineering (CAE) • Information Visualization • Computer-Generated Images (CGI) and Animation (CGA) • High-Performance Computing: Linux and Beowulf

  3. 6500 news stories from the WWW in 1997 SPIRIX software ThemeScapes http://www.cartia.com Information Retrieval (IR) and Text Mining: Commercial Applications

  4. Visual Programming andSoftware Engineering

  5. Stages of Data Mining andKnowledge Discovery in Databases

  6. Knowledge Discovery in Databases (KDD)and Fraud Detection

  7. Genetic Wrapper for Change of Representation and Inductive Bias Control [2] Representation Evaluator for Learning Problems Dtrain(Inductive Learning) D: Training Data Dval(Inference) : Inference Specification f(α) Representation Fitness α Candidate Representation [1] Genetic Algorithm Optimized Representation Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning [1]

  8. [2] Representation Evaluator for Input Specifications [A] Inductive Learning (Parameter Estimation from Training Data) Dtrain(Model Training) h Hypothesis [B] Validation (Measurement of Inferential Loss) Dval(Model Validation by Inference) : Evidence Specification f(α) Specification Fitness (Inferential Loss) α Candidate Input Specification Genetic Algorithms for Parameter Tuning in Bayesian Network Structure Learning [2]

  9. Learning Environment [A] Structure Learning G2 G1 G4 G5 D: Microarray Data G3 G = (V, E) Graph Component of BN [B] Parameter Estimation G2 G1 G4 G5 B = (V, E, ) BN with Probabilities  G3 Specification Fitness (Inferential Loss) Dval(Model Validation by Inference)

  10. Microarrays

  11. A Gene Network for Yeast[Friedman, Nachman, Linial, Pe’er, 2000]

  12. Publication (e.g., PubMed) Experiment Source (e.g., Taxonomy) Gene (e.g., GenBank) Sample Hybridization Array Normalization/ Discretization Data Components of A Microarray Experiment:Hybridization

  13. Computational Workflows (e.g., myGrid) Pathway & Network Learning Specification Feature Selection Specification Experimental Services & Metadata (Mage-ML XML) Gene Expression Model Data Preprocessing Specification Parameter Learning Specification Model Analysis Specification Discretization Use Case Data Mining Use Case Validation (e.g., Bootstrap) Use Case Components of A Microarray Experiment:Computational Gene Expression Modeling

  14. DESCRIBER: An ExperimentalIntelligent Filter • Example Queries: • What experiments have found cell cycle-regulated metabolic pathways in Saccharomyces? • What codes and microarray data were used, and why? Users of Scientific Document Repository DESCRIBER Learning and Inference Components Historical Use Case & Query Data Personalized Interface New Queries Domain-Specific Collaborative Filtering Decision Support Models Interface(s) to Distributed Repository Domain-Specific Repositories Experimental Data Source Codes and Specifications Data Models Ontologies Models

  15. Module 2 Learning & Validation of Bayesian Network Models for Use Cases Estimation of Constraint Parameters Module 3 Graphical Models of Use Cases Historical Use Case & Query Data Module 4 Learning & Validation of Bayesian Network Models for MAGE Data & Codes Data Personalized Interface Module 5 MAGE Data Model User New Queries Module 1 Intelligent Collaborative Filtering Front-End Relational Modelsof MAGE Data Constrained Models of Use Cases DESCRIBEROverview

  16. Module 1 Personalized Interface New Query from User Intelligent Collaborative Filtering Front-End Response to User Relational Models of (Domain-Specific) Data Integrated Reasoning Component: XML Validator and Constraint Checker Relational Probabilistic Model Constraint Selector Constraints on Repository Content Constrained Models of Use Cases DESCRIBERCollaborative Filtering Module

More Related