1 / 19

The AMADEUS Motif Discovery Platform

The AMADEUS Motif Discovery Platform. C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University. Genome Research 2008. ApoSys workshop May ‘ 08. TF. TF. 5 ’. 3 ’. Gene. BS. BS. Promoter Analysis: Exteremely brief intro.

arva
Télécharger la présentation

The AMADEUS Motif Discovery Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TheAMADEUSMotif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University Genome Research 2008 ApoSys workshop May ‘08

  2. TF TF 5’ 3’ Gene BS BS Promoter Analysis: Exteremely brief intro • Transcription is regulated primarily by transcription factors (TFs) – proteins that bind to DNA subsequences, called binding sites (BSs) • TFBSs are located mainly (not always!) in the gene’s promoter– the DNA sequence upstream the gene’s transcription start site (TSS) • TFs can promote or repress transcription TSS

  3. Promoter Analysis (cont.)TFBS models • The BSs of a particular TF share a common pattern, or motif, which is often modeled using: • Consensus string TASDAC (S={C,G} D={A,G,T}) • Position weight matrix (PWM / PSSM) > Threshold = 0.01: TACACC (0.06) TAGAGC (0.06) TACAAT (0.015) …

  4. Motifdiscovery Promoter Analysis (cont.): Typical pipeline Promotersequences Co-regulated gene set Cluster I Gene expressionmicroarrays Clustering Cluster II Cluster III Location analysis(ChIP-chip, …) Functional group(e.g., GO term)

  5. Promoter Analysis (cont.): Goals Reverse-engineer the transcriptional regulatory network = find the TFs (and their BSs) that regulate the studied biological process Input: A set of co-expressed genes Output:“Interesting” motif(s): • Known motifs: PRIMA, ROVER, … • Novel motifs: MEME, AlignACE, … • A group of co-occurring motifs = cis-regulatory module (CRM): MITRA, CREME, … AMADEUS

  6. Promoter Analysis: Status of motif discovery tools • Extant tools perform reasonably well for: • Finding known/novel motifs in organisms with short, simple promoters, e.g., yeast • Identifying some of the known motifs in complex species, e.g., TFs whose BSs are usually close to the TSS • … but often fail in other cases! • Each tool is custom-built for a specific target score, often parametric (i.e., assumes a BG model) or uses a small part of the genome as BG reference; Majority of tools can efficiently handle only dozens of genes • Comparison of tools: [Tompa et al. ’05]

  7. AMADEUS A Motif Algorithm for Detecting Enrichment in mUltiple Species • Research platform: • Extensible: add new algs, scores, motif models • Flexible: control params, algs, scores of execution • Experimental tool: • Sensitive: find subtle signals • Efficient: analyze many long sequences • Informative: show lots of info on motifs • User-friendly: nice GUI

  8. Main features: I/O • Input: • Type: target set / expression data • Multiple species / target-sets • Sequence region (promoter, 1st intron, 3’ UTR, …) • Output: • Non-redundant set of motifs • Rich info per output motif: • Graphical motif logo • Multiple scores & combined p-value • Similarity to known TFBS models • List of target genes • BS localization graph • Targets mean expression graph

  9. Main features: alg. • Algorithm: Multiple refinement phases: • Each phase receives best candidates of previous phase,and refines them (e.g., uses a more complex motif model) • First phases are simple and fast (e.g., try all k-mers); Last phases are more complex (e.g., optimize PWM using EM)

  10. Main features: scores • Motif scores: • User selects scores to use, a subset of: • Target-set: Over/under-representation: • Hypergeometric • GC-content+length binned binomial • Expression: • Enrichment of ranked expression (multiple conditions) (Not yet in the public version) • Global/spatial: • Localization • Strand-bias • Chromosomal preference • Scores are combined into a single p-value • Doesn’t assume specific models for distribution of BSs and/or expression values

  11. Main features: misc. • GUI: • Control all parameters • Save/load parameters from file • Save textual+graphical output to file • TFBS viewer • Other: • Ignore redundant sequences (with identical subsequence) • Applicable to multiple genome-scale promoter sequences • Bootstrapping: Empirical p-value estimation using random target sets / shuffled data • Execution modes: GUI , batch • Interoperability: Java application

  12. Case study:G2 & G2/M phases of human cell cycle [Whitfield et al. ’02] CHR(not in TRANSFAC) NF-Y Module: CHR and NF-Y motifs co-occur (Module was reported in [Linhart et al., ’05], [Tabach et al. ’05])

  13. Benchmark I:Yeast TF target sets [Harbison et al. ’04] Source: ChIP-chip [Harbison et al., ’04] Data: target-sets of 83 TFs with known BS motifs Average set size: 58 genes (=35 Kbps) Success rates: (for top 2 motifs of lengths 8 & 10)

  14. Performance on metazoan datasets • Results on 42 target-sets: • Collected from 29 publications • Based on high-throughput expr’s • Species: human, mouse, fly, worm • Sets: 26 TFs, 8 microRNAs • All have known motifs

  15. Global Analysis I:Localized human+mouse motifs • Input: • All human & mouse promoters (2 x ~20,000) • Region: -500…100 (w.r.t. TSS) • Total sequence length: ~26 Mbps • [No target-set / expression data] • Score: localization • Results: • Recovered known TFs: Sp1, NF-Y, GABP, TATA, Nrf-1, ATF/CREB, Myc, RFX1 • Recovered the splice donor site • Identified several novel motifs

  16. Global Analysis II:Chromosomal preference • Input: • All fly promoters (~14,000) • Region: -1000…200 (w.r.t. TSS) • Total sequence length: ~11 Mbps • [No target-set / expression data] • Score: chromosomal preference • Results: • DNA Replication Element Factor (DREF) on X chromosome

  17. Global Analysis II:Chromosomal preference (cont.) • Input: • All worm promoters (~18,000) • Region: -500…100 (w.r.t. TSS) • Total sequence length: 6.6 Mbps • [No target-set / expression data] • Score: chromosomal preference • Results: • Novel motif on chrom IV

  18. Summary • Developed Amadeus motif discovery platform: • Easy to use • Feature-rich, informative • Sensitive & efficient • Constructed a large, real-life, heterogeneousbenchmark for testing motif finding tools • Demonstrated various applications of motif discovery • http://acgt.cs.tau.ac.il/amadeus

  19. Acknowledgements Tel-Aviv University Chaim Linhart Yonit Halperin Ron Shamir The Hebrew University of Jerusalem Gidi Weber

More Related