1 / 70

Mining of cis -Regulatory Motifs Associated with Tissue-Specific Alternative Splicing

Mining of cis -Regulatory Motifs Associated with Tissue-Specific Alternative Splicing. Jihye Kim Bioinformatics Research Center. Outline. Background and Motivation Association Rule Mining (ARM) Use ARM techniques to discover cis -regulatory elements involved in alternative splicing

cricket
Télécharger la présentation

Mining of cis -Regulatory Motifs Associated with Tissue-Specific Alternative Splicing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mining of cis-Regulatory Motifs Associated with Tissue-Specific Alternative Splicing Jihye Kim Bioinformatics Research Center

  2. Outline • Background and Motivation • Association Rule Mining (ARM) • Use ARM techniques to discovercis-regulatory elements involved in alternative splicing • Conclusions and Future Directions

  3. cytoplasm Nucleus intron exon DNA gene TRANSCRIPTION Pre-mRNA RNA SPLICING matureRNA EXPORT matureRNA TRANSLATION protein Central Dogma of Molecular Biology

  4. Splicing • Introns are removed and flanking exons are concatenated [image from http://fig.cox.miami.edu/~cmallery/150/gene/c7.17.11.spliceosome.jpg]

  5. Alternative Splicing Pre-mRNA mRNA protein • Over 70% of human genes show AS • Some genes express thousands of different mRNAs

  6. Biological Relevance of AS • Major mechanism to generate protein diversity • Important in gene regulation • Highly relevant to disease • 15% disease-causing mutations affect splicing [Krawczak 1992] [Krawczak 1992] Krawczak, M., Reiss, J., and Cooper, D.N. 1992 Hum. Genet. 90: 41-54

  7. Types of Alternative Splicing Cassette Exon [Source from Cartegni et al. 2002]

  8. Regulation of AS • Spliceosome detects splice site • Often, splicing factors bind to intron/exon to assist/repress exon splicing [Image from J.R. Sanford, et al., Cell Science at a Glance 117(26:6261]

  9. Cis-Regulatory Elements • Short sequences • ESE, ESS, ISE, ISS • Close to splice sites GENEINFO:Specie:Homo sapiens, human GENEINFO:Gene Name:fibronectin eda exon GENEINFO:Entry type:Exonenhancer GENEINFO:Methods:In vivo splicing assay SEQINFO:Sequence:GAAGAAGA SEQINFO:Sequence origin:Exonic [Source from http://www.ebi.ac.uk/asd/aedb/ [Image from Z.Wang and C. Burge, RNA 2008

  10. Investigating AS Regulation • Several computational methods • Over-represented hexamers from brain-specific genes [Brudno 2001] • RESCUE-ESE founds 10 motifs with enhancer activity [Fairbrother 2002] • Motif pairs by coCOA (compositionally orthogonalized Co-Occurrence Analysis) [Friedman 2008] • Most methods • use only sequence data • focus on the effect of individual motifs [Brudno 2001] Brudno M., Gelfand M.S., et al., 2001 NAR 20 (11) 2338-21348 [Fairbrother 2002] Fairbrother WG., et al., 2002 Science 9;297(5583):1007-13 [Friedman 2008] Friedman B.A., et al., 2008 Genome Res 18(10) 1643-51

  11. Motivation • Often, AS is regulated by combination of several binding factors • Exonic UAGG AND GGGG motifs required for skipping of the cassette exon of the glutamate NMDA R1 receptor [Han 2005] UAGG GGGG [Han 2005] K. Han, et al., PloS Biol. 2005 3(5):e158

  12. Goal • Find Motifs and Motif combinations involved in AS Motif Exon exclusion MotifMotif Motif, Motif Exon exclusion • Association Rules : Unexpected relationships between two objects

  13. Association Rule Mining • By Agrawal et al. in 1993 • Initially used for Market Basket Analysis • An association rule is a pattern that states when X occurs, Y occurs with certain probability • X : antecedent (left-hand-side, lhs), • Y : consequent (right-hand-side, rhs) • Goal: Find all interesting rules XY

  14. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam

  15. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam An unexpected rule Beer => Diaper

  16. Rule Strength Measures • Given a rule, • Support = Pr(X∧Y) • Confidence = Pr(Y | X) • Lift = Pr(X∧Y)/ Pr(X)Pr(Y) • Dependency of lhs and rhs • Generally, lhs and rhs have positive dependency if lift >1.0 XY

  17. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam

  18. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemset = itemset whose support > 0.5

  19. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets (support)

  20. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets (support) Beer (0.8) Beer (0.8), Jam (0.6), Diaper (0.6) {Beer, Diaper} (0.6)

  21. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets Association Rules (confidence) Beer (0.8), Jam (0.6), Diaper (0.6) {Beer, Diaper} (0.6)

  22. ARM Example Cart 1 : Milk, Bread, Diaper, Beer, Jam, Banana Cart 2 : Beer, Nuts, Tissue, Diaper Cart 3 : Apple, Beer Cart 4 : Jam, Beer, Diaper Cart 5 : Bread, Butter, Tissue, Jam Min supp = 0.5 Min conf = 0.7 Frequent Itemsets Association Rules (confidence) Beer (0.8), Jam (0.6), Diaper (0.6) {Beer, Diaper} (0.6) Beer => Diaper (0.75)

  23. Apriori Algorithm • Most popular ARM algorithm • Two steps: 1. Find all itemsets that satisfy min_supp. (frequent itemsets) • any subset of a frequent itemset is also frequent • Find all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on. 2. Generate Rules • A  B is an association rule if Confidence(A  B) ≥ min_conf

  24. Association Rules of Motifs in AS • Beer => Diaper : Shopping items purchased together in a market basket data • Motif A => Motif B : Motif pair regulates together alternative splicing

  25. Part I : Finding association rules of cis-regulatory elements involved in alternative splicing[Proceedings of the 45th annual southeast regional conference (ACM-SE) Winston-Salem, North Carolina pp. 232 – 237, 2007, BEST REGULAR PAPER]

  26. Dataset Splice Array [Pan 2004] with 6 probes 3126 exon skipping genes in mouse %ASex : percentage of exon skipping in 10 tissues AS Datasets in Mouse [Pan 2004] Pan, Q., et al., 2004 Mol Cell 16(6):929-942

  27. K-mers Around Cassette Exon (items) • Pre-mRNA sequences • Transcripts from NCBI • BLAT to align transcripts to mouse genome • 200 bps from 7 regions around cassette exon • 2565 genes in total • Items (6mers) : • AAAAAA to TTTTTT in region 1 … 7

  28. ARM for AS Motif Rules • Items : all possible hexamers (motifs) • Transactions : 2565 AS genes • Goal : finding motif association rules in AS genes. (e.g., AGGATA TTAGCT) • By Apriori algorithm [Agrawal 1993] Find All Frequent Hexamers Generate Hexamer Rules [Agrawal 1993] Agrawal R., Imielinski T., Swami AN., 1993 SIGMOD 22(2):207-216

  29. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7

  30. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7 - Frequent 3mers sets (support) AGG (0.8), GAT (0.6), TAG (0.6), {AGG,TAG} (0.6)

  31. ARM Example [Example] Seq 1 : ACGATTAGG Seq 2 : GAATAGG Seq 3 : TGCAGG Seq 4 : GGATTAGG Seq 5 : CAGAT Min support = 0.5 Min confidence = 0.7 - Frequent 3mers sets (support) AGG (0.8), GAT (0.6), TAG (0.6), {AGG,TAG} (0.6) - Rules (confidence) AGGTAG (0.75) TAGAGG (1.0)

  32. - 4_TGAAGA, 7_GAAGAA (ASF/SF2, SRp55) - 6_TTTTCT, 6_AATAAA, … 6,000 6-mers - Candidates of regulatory motifs 1 7 4 6 5 2 3 Motif Association Rules from AS Genes Frequent 6-mers Minsup = 0.05 (129 genes) Association Rules Minconf = 0.4 - 4_AAAAAT  4_TGAAGA, 4_AAAGGA  4_AGAAGA, - 4_GAAAAA  4_AAGAAG, 4_CTGCCT  4_CTGGAG, - 4_AGGAAA  4_AAGAAG, 4_AATAAA  4_AAGAAG - Candidates of regulatory combinations for AS

  33. Clustering by AS Pattern in 10 Tissues • Hypothesize : Motif combinations “cause” AS profile • Cluster genes based on AS profile. We use • Euclidean distance / Correlation • Average linkage clustering • Frequent 6-mers in cluster are motif candidates

  34. 1 7 4 6 5 2 3 Association Rules from Clusters • 112 frequent hexamers (0 – 39 for each cluster) • Lift (XY) > 2.0 • Comparison with outside the cluster (p-value < 2.13e-10) • Association rules are candidates of motif combinations for the corresponding AS pattern Correlation based clusters

  35. AS profile of Genes with a Motif Rule Example: 7_AGCAGC => 6_GCAGCC

  36. Summary • Motifs and motif association rules from a group of genes with similar AS pattern • Candidates of motif combinations • BUT: • Problems in choosing the “right” threshold • Dependent on clustering technique

  37. Part II : Mining of Cis-regulatory Motifs Associated with Tissue-specific Alternative Splicingby Discretization-Based Quantitative Association Rule Mining

  38. Quantitative Association Rule Mining • Mine numeric or quantitative data • Two methods : • Discretization (Binning methods, e.g., equi-width, equi-depth, distance-based) • Distribution-based

  39. Example Cart 1 : Liquor $21, Vegetables $20, Meat $12 Cart 2 : Liquor $7, Vegetables $70 Cart 3 : Liquor $86, Meat $59 Cart 4 : Liquor $29, Vegetables $3 Cart 5 : Liquor $98 Cart 6 : Liquor $33, Meat $16

  40. Discretization-based Discretization of numeric attributes Intuitive and popular Sensitive to bin size

  41. AS profile items • Use quartile to convert numeric %ASexes to character AS profile items • BrainLow :The first %ASex quartile in Brain • BrainHigh : The last %ASex quartile in Brain BrainLow BrainHigh

  42. Finding Motifs Involved in Tissue-Specific AS • Items : • hexamers in gene regions • exon skipping rate in tissues • Transactions : • 2565 genes from Pan’s data set • Goal : find associations between hexamers and exon skipping rate AGGATA in cassette exon  High exon skipping in Brain

  43. Tissue-Specific AS Motif Combinations • 1464 association rules are found in total • 204 complex rules are found • lhs : combinations of 113 frequent hexamers rhs : AS profile items in tissues • All rules have >1.9 lift • 117 rules show motif combinations in different regions

  44. AS profile of Motif • 1260 simple rules with 806 hexamers

  45. 1 7 4 6 5 2 3 {5_TTTTTA, 7_AGAGGA} => {HeartHigh}

  46. 1 7 4 6 5 2 3 AS Profile of Motif Combinations

  47. Part III: Mining of Cis-regulatory Motifs Associated with Tissue-specific Alternative Splicingby Distribution-Based Quantitative Association Rule Mining[J. Kim, S. Zhao, B. Howard, S. Heber, LNBI 5542, pp 260-71, 2009]

  48. Distribution-based QARM • Proposed by Aumann and Lindell • Diaper => Liquor:mean=$12/week (overall mean =$7/week) • Association between a subset of a database and it’s “extraordinary” behavior • To define “extraordinary” behavior, statistical tests are used

  49. Our Data • Heptamers : categorical items • Exon skipping rates : quantitative items G1 : 1_ACTGGAG, …, 7_TTTTCGA, 43(Brain), …, 78(Testis) G2 : 1_AAGCTTG, …, 7_TCTTAAA, 22(Brain), …, 54(Testis) G3 : 1_AGGCCAA, …, 7_TGAATTT, 4(Brain), …, 13(Testis) G4 : 1_ATATTTT, …, 7_TTTTCGA, 89(Brain), …, 100(Testis) … …

  50. Our goal • Mining of “heptamer(s) => exon skipping rate” rules • Mean of exon skipping rates • T-test for extraordinary exon skipping rates • E.g., 4_TTGCGAC => mean(Brain) =80 (overall mean(Brain) = 30)

More Related