1 / 28

Using Bayesian Networks to Analyze Expression Data

Using Bayesian Networks to Analyze Expression Data. N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem. Transcription. mRNA. Gene. Central Dogma. Translation. Protein. Cells express different subset of the genes

tegan
Télécharger la présentation

Using Bayesian Networks to Analyze Expression Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Bayesian Networks to Analyze Expression Data N. Friedman M. Linial I. Nachman D. Pe’er Hebrew University, Jerusalem .

  2. Transcription mRNA Gene Central Dogma Translation Protein Cells express different subset of the genes In different tissues and under different conditions

  3. Microarrays (aka “DNA chips”) • New technological breakthrough: • Measure RNA expression levels of thousands of genes in one experiment • Measure expression on a genomic scale • Opens up new experimental designs • Many major labs are using,or will use this technology in the near future

  4. Aij - the mRNA level of gene j in experiment i The Problem Genes Goal: • Learn regulatory/metabolic networks • Identify causal sources of the biological phenomena of interest j Experiments i

  5. Analysis Approaches • Clustering of expression data • Groups together genes with similar expression patterns • Does not reveal structural relations between genes • Boolean networks • Deterministic models of the logical interactions between genes • Deterministic, impractical for real data

  6. Example: Cell-Cycle Data [Spellman et al] Cell cycle stages clusters

  7. Our Approach • Characterize statistical relationships between expression patterns of different genes • Beyond pair-wise interactions • Many interactions are explained by intermediate factors • Regulation involves combined effects of several gene-products We build on the language of Bayesian networks

  8. Marge Homer Lisa Maggie Bart Network: Example Noisy stochastic process: Example: Pedigree • A node represents an individual’sgenotype • Modeling assumptions: • Ancestors can effect descendants' genotype only by passing genetic materials through intermediate generations

  9. Y1 Y2 X Network Structure Ancestor Generalizing to DAGs: • A child is conditionally independent from its non-descendents, given the value of its parents Often a natural assumption for causal processes • if we believe that we capture the relevant state of each intermediate stage. Parent Non-descendent Non-descendent Descendent

  10. X P(Y |X) X 1 0.9 0.1 0 0.3 0.7 Y P(Y | X) X Y Local Probabilities • Associated with each variable Xi is a conditional probability distribution P(Xi|Pai:) • Discrete variables: Multinomial distribution • Continuous variables: Choice: for example linear Gaussian

  11. B E R A C Bayesian Network Semantics • Compact & efficient representation: •  k parents  O(2kn) vs. O(2n) params • parameters pertain to local interactions Qualitative part DAG specifies conditional independence statements Quantitative part local probability models Unique joint distribution over domain + = P(C,A,R,E,B) = P(B)*P(E|B)*P(R|E,B)*P(A|R,B,E)*P(C|A,R,B,E) versus P(C,A,R,E,B) = P(B)*P(E) * P(R|E) * P(A|B,E) * P(C|A)

  12. Why Bayesian Networks? Bayesian Networks: • Flexible representation of dependency structure of multivariate distributions • Natural for modeling processes with local interactions Learning of Bayesian Networks • Can learn dependencies from observations • Handles stochastic processes: • “true” stochastic behavior • noise in measurements

  13. Modeling Biological Regulation Variables of interest: • Expression levels of genes • Concentration levels of proteins • Exogenous variables: Nutrient levels, Metabolite Levels, Temperature, • Phenotype information • … Bayesian Network Structure: • Capture dependencies among these variables

  14. Measured expression level of each gene Random variables Probabilistic dependencies Gene interaction A B A X B Examples Interactions are represented by a graph: • Each gene is represented by a node in the graph • Edges between the nodes represent direct dependency

  15. A B C More Complex Examples • Dependencies can be mediated through other nodes • Common effects can imply conditional dependence B A B C A C Common cause Intermediate gene

  16. B E R A C Outline of Our Approach Bayesian Network Learning Algorithm Expression data Use learned network to make predictions about structure of the interactions between genes

  17. candidates parents in BN Learning With Many Variables Sparse Candidate algorithm - efficient heuristic search that relies on sparseness • Choose candidate set for direct influence for each gene • Find optimal BN constrained on candidates • Iteratively improve candidate set

  18. Experiment Data from Spellman et al. (Mol.Bio. of the Cell 1998). • Contains 76 samples of all the yeast genome: • Different methods for synchronizing cell-cycle in yeast. • Time series at few minutes (5-20min) intervals. • Spellman et al. identified 800 cell-cycle regulated genes.

  19. 0 - + Log(ratio to control) -0.5 0.5 Methods Experiment 1: discretized data into 3 levels • Learn multinomial probabilities Experiment 2: • Learn linear interactions (w/ Gaussian noise) No prior biological knowledge was used

  20. Network Learned

  21. Challenge: Statistical Significance Sparse Data • Small number of samples • “Flat posterior” -- many networks fit the data Solution • estimate confidence in network features • Two types of features • Markov neighbors: Xdirectly interacts with Y • Order relations: X is an ancestor of Y

  22. B E R A C Confidence Estimates D1 Bootstrap approach[FGW, UAI99] Learn resample D2 E B D Learn resample R A C ... resample E B Dm Learn R A C Estimate:

  23. 500 Random Real 450 Random Real 400 350 300 250 200 150 100 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Testing for Significance • We run our procedure on randomized data where we reshuffled the order of values for each gene Markov w/ Gaussian Models 4000 3500 3000 2500 2000 Features with Confidence above t 1500 1000 500 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t

  24. 250 Random Random Real Real 200 150 100 50 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Testing for Significance Markov w/ Multinomial Models 1400 1200 1000 800 Features with Confidence above t 600 400 200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 t

  25. Local Map

  26. Finding Key Genes Key gene: a gene that preceeds many other genes • YLR183C • MCD1 Mitotic Chromosome Determinant; • RAD27 DNA repair protein • CLN2 role in cell cycle START • SRO4 involved in cellular polarization during budding • YOX1 Homeodomain protein that binds leu-tRNA gene • POL30 required for DNA replication and repair • YLR467W • CDC5 • MSH6 Homolog of the human GTBP protein • YML119W • CLN1 role in cell cycle START

  27. Strong Markov Relations

  28. Future Work • Finding suitable local distribution models • Temporal aspect - DBN • Correct handling of hidden variables • Can we recognize hidden causes of coordinated regulation events? • Incorporating prior knowledge • Incorporate large mass of biological knowledge, and insight from sequence/structure databases • Abstraction • Combine with cluster analysis

More Related