460 likes | 613 Vues
Join us for an engaging seminar series on Promoter Prediction and RNA Structure at Iowa State University. The first seminar features Coralie Lashbrook discussing "Laser Capture Microdissection-Facilitated Transcriptional Profiling of Abscission Zones in Arabidopsis" on Monday, October 24, at 12:10 PM in 101 Ind. Ed. II. Mark your calendars for the next seminar on November 14, where Douglas Brutlag will present "Discovering Transcription Factor Binding Sites." For questions or further information, contact David Dobbs at ddbobs@iastate.edu.
E N D
10/24/05Promoter PredictionRNA Structure & FunctionPrediction D Dobbs ISU - BCB 444/544X: Promoter Prediction
Announcements Seminar (Mon Oct 24) (several additional seminars listed in email sent to class) 12:10 PMIG Faculty Seminar in 101 Ind Ed II "Laser capture microdissection-facilitated transcriptional profiling of abscission zones in Arabidopsis" Coralie Lashbrook, EEOB http://www.bb.iastate.edu/%7Emarit/GEN691.html Mark your calendars: 1:10 PM Nov 14Baker Seminar in Howe Hall Auditorium "Discovering transcription factor binding sites" Douglas Brutlag,Dept of Biochemistry & Medicine, Stanford University School of Medicine D Dobbs ISU - BCB 444/544X: Promoter Prediction
Announcements • 544 Semester Projects • Thanks to all who sent already! • Others: Information needed today! • ddobbs@iastate.edu • Briefly describe: • Your background & current grad research • Is there a problem related to your research you would like to learn more about & develop as project for this course? • or • What would your ‘dream’ project be? D Dobbs ISU - BCB 444/544X: Promoter Prediction
Announcements Exam 2 - this Friday Posted Online:Exam 2 Study Guide 544 Reading Assignment (2 papers) Office Hours: David Mon 1-2 PM in 209 Atanasoff Drena Tues 10-11AM in 106 MBB Michael - none this week Thurs No Lab - Extra Office Hrs instead: David 1-3 PM in 209 Atanasoff Drena 1-3 PM in 106 MBB D Dobbs ISU - BCB 444/544X: Promoter Prediction
Announcements • Updated PPTs & PDFs for Gene Prediction lectures (covered on Exam 2) will be posted today (changes are minor) • Is everyone on BCB 444/544 mailing list? Auditors? D Dobbs ISU - BCB 444/544X: Promoter Prediction
Promoter Prediction & RNA Structure/Function Prediction Mon Quite a few more words re: Gene prediction Promoter prediction WedRNA structure & function RNA structure prediction 2' & 3' structure prediction miRNA & target prediction Thurs No Lab Fri Exam 2 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Reading Assignment - previous • Mount Bioinformatics • Chp 9Gene Prediction & Regulation • pp 361-401 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • * Brown Genomes 2 (NCBI textbooks online) • Sect 9 Overview: Assembly of Transcription Initiation Complex • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.chapter.7002 • Sect 9.1-9.3 DNA binding proteins, Transcription initiation • http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.section.7016 • *NOTEs: Don’t worry about the details!! • See Study Guide for Exam 2 re:Sections covered D Dobbs ISU - BCB 444/544X: Promoter Prediction
Optional - but very helpful reading: (that's a hint!) • Zhang MQ (2002) Computational prediction of eukaryotic protein-coding genes. Nat Rev Genet 3:698-709 http://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html • Wasserman WW & Sandelin A (2004) Applied bioinformatics for identification of regulatory elements. Nat Rev Genet 5:276-287 http://proxy.lib.iastate.edu:2103/nrg/journal/v5/n4/full/nrg1315_fs.html Check this out: http://www.phylofoot.org/NRG_testcases/ 03489059922 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Reading Assignment (for Wed) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction
Review last lecture: Gene Prediction(formerly Gene Prediction - 3) • Overview of steps & strategies • Algorithms • Gene prediction software D Dobbs ISU - BCB 444/544X: Promoter Prediction
Predicting Genes - Basic steps: • Obtain genomic DNA sequence • Translate in all 6 reading frames • Compare with protein sequence database • Also perform database similarity search • with EST & cDNA databases, if available • Use gene prediction programs to locate genes • Analyze gene regulatory sequences • Note: Several important details missing above: • 1. Mask to "remove" repetitive elements (ALUs, etc.)・ • Perform database search on translatedDNA (BlastX,TFasta) • Use several programs to predict genes (GenScan,GeneMark.hmm) • 4. Translate putative ORFs and search for functional motifs (Blocks, Motifs, etc.) & regulatory sequences D Dobbs ISU - BCB 444/544X: Promoter Prediction
Gene prediction flowchart Fig 5.15 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Overview of gene prediction strategies • What sequence signals can be used? • Transcription:TF binding sites, promoter, initiation site, terminator • Processing signals:splice donor/acceptors, polyA signal • Translation: start (AUG = Met) & stop (UGA,UUA, UAG) • ORFs, codon usage • What other types of information can be used? • cDNAs & ESTs(pairwise alignment) • homology(sequence comparison, BLAST) D Dobbs ISU - BCB 444/544X: Promoter Prediction
Examples of gene prediction software • Similarity-based or Comparative • BLAST • SGP2 (extension of GeneID) • Ab initio = “from the beginning” • GeneID - (used in lab last week) • GENSCAN - (used in lab last week) • GeneMark.hmm - (should try this!) • Combined "evidence-based” • GeneSeqer (Brendel et al., ISU) BEST?GENSCAN, GeneMark.hmm, GeneSeqer but depends on organism & specific task D Dobbs ISU - BCB 444/544X: Promoter Prediction
Annotated lists of gene prediction software • URLs from Mount Chp 9, available online Table 9.1http://www.bioinformaticsonline.org/links/ch_09_t_1.html • from Pevsner Chps 14 & 16 http://www.bioinfbook.org/chapt14.htm - prokaryotic http://www.bioinfbook.org/chapt16.htm - eukaryotic • Table in Zhang Nat Rev Genet article: hptt://proxy.lib.iastate.edu:2103/nrg/journal/v3/n9/full/nrg890_fs.html • Another list: Kozar, Stanford http://cmgm.stanford.edu/classes/genefind/ • Performance Evaluation? Guig�ó, Barcelona(&sites above)http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html D Dobbs ISU - BCB 444/544X: Promoter Prediction
Gene prediction: Eukaryotes vs prokaryotes Gene prediction is easier in microbial genomes Methods? Previously, mostly HMM-based Now: similarity-based methods because so many genomesavailable see Mount Fig 9.7 (E.coli gene) Many microbial genomes have been fully sequenced & whole-genome "gene structure" and "gene function" annotations are available. e.g., GeneMark.hmm TIGRComprehensive Microbial Resource (CMR) NCBIMicrobial Genomes D Dobbs ISU - BCB 444/544X: Promoter Prediction
UCSC Browser view of 1000 kb region (Human URO-D gene) Fig 5.10 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
GeneSeqer - Brendel et al. Intron GT AG Donor Acceptor Splice sites Spliced Alignment Algorithm http://deepc2.psi.iastate.edu/cgi-bin/gs.cgi • Perform pairwise alignment with large gaps in one sequence (due to introns) • Align genomic DNA with cDNA, ESTs, protein sequences • Score semi-conserved sequences at splice junctions • Using a Bayesian model • Score coding constraints in translated exons • Using a Bayesian model Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Start codon Stop codon Genomic DNA Start codon Stop codon -Poly(A) mRNA Cap- 5’-UTR 3’-UTR Brendel - Spliced Alignment I: Compare with cDNA or EST probes Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Start codon Stop codon Genomic DNA Protein Brendel - Spliced Alignment II: Compare with protein probes Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Information Content Ii: • Extent of Splice Signal Window: Splice Site Detection Do DNA sequences surrounding splice "consensus" sequences contribute to splicing signal? YES i: ith position in sequence Ī: avg information content over all positions >20 nt from splice site Ī: avg sample standard deviation of Ī Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Human T2_GT Human T2_AG Information content vs position Which sequences are exons & which are introns? How can you tell? Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Let S = s-l s-l+1 s-l+2…s-1GT s1 s2 s3 …sr Bayesian Splice Site Prediction where H indexes the hypotheses of GT or AG at - True site in reading phase 1, 2, or 0 - False within-exon site in reading phase 1, 2, or 0 - False within-intron site Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
H0: H=T 2-class model: Bayes Factor as Decision Criterion 7-class model: Brendel et al (2004)Bioinformatics 20: 1157 Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
PG PG (1-PG)(1-PD(n+1)) en en+1 (1-PG)PD(n+1) PA(n)PG (1-PG)PD(n+1) in in+1 1-PA(n) Markov Model for Spliced Alignment Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Evaluation of Splice Site Prediction Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: • Specificity: • Misclassification rates: • Normalized specificity: Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Performance? Human GT site Human AG site Sn Sn A. thaliana AG site A. thaliana GT site Sn Sn • Note: these are not ROC curves (plots of (1-Sn) vs Sp) • But plots such as these (& ROCs) much better than using "single number" to compare different methods • Both types of plots illustrate trade-off: Sn vs Sp Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Sp = Evaluation of Splice Site Prediction What do measures really mean? Fig 5.11 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Actual True False • TP • FP PP=TP+FP True Predicted • FN • TN False PN=FN+TN AP=TP+FN AN=FP+TN = Coverage • Sensitivity: Careful: different definitions for "Specificity" Brendel definitions • Specificity: cf. Guig�ó definitions Sn: Sensitivity = TP/(TP+FN) Sp: Specificity = TN/(TN+FP) = Sp- AC: Approximate Coefficient = 0.5 x ((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1 Other measures? Predictive Values, Correlation Coefficient D Dobbs ISU - BCB 444/544X: Promoter Prediction
Best measures for comparing different methods? • ROC curves(Receiver Operating Characteristic?!!) • http://www.anaesthetist.com/mnm/stats/roc/ • "The Magnificent ROC" - has fun applets & quotes: • "There is no statistical test, however intuitive and simple, which will not be abused by medical researchers" • Correlation Coefficient • (Matthews correlation coefficient (MCC) • MCC = 1 for a perfect prediction • 0 for a completely random assignment • -1 for a "perfectly incorrect" prediction Do not memorize this! D Dobbs ISU - BCB 444/544X: Promoter Prediction
Performance of GeneSeqer vs other methods? • Comparison with ab initio gene prediction (e.g., GENESCAN) • Depends on: • Availability of ESTs • Availability of protein homologs Other Performance Evaluations? Guig�ó http://www1.imim.es/courses/SeqAnalysis/GeneIdentification/Evaluation.html Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
GeneSeqer vs GENSCAN (Exon prediction) 1.00 0.90 0.80 0.70 0.60 Exon (Sn + Sp) / 2 0.50 0.40 GeneSeqer 0.30 NAP 0.20 GENSCAN 0.10 0.00 0 10 20 30 40 50 60 70 80 90 100 Target protein alignment score GENSCAN - Burge, MIT Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
1.00 0.90 0.80 0.70 0.60 Intron (Sn + Sp) / 2 0.50 GeneSeqer 0.40 0.30 NAP 0.20 GENSCAN 0.10 0.00 0 10 20 30 40 50 60 70 80 90 100 Target protein alignment score GeneSeqer vs GENSCAN (Intron prediction) GENSCAN - Burge, MIT Brendel 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Other Resources • Current Protocols in Bioinformatics • http://www.4ulr.com/products/currentprotocols/bioinformatics.html • Finding Genes • 4.1 An Overview of Gene Identification: Approaches, Strategies, and Considerations • 4.2 Using MZEF To Find Internal Coding Exons • 4.3 Using GENEID to Identify Genes • 4.4 Using GlimmerM to Find Genes in Eukaryotic Genomes • 4.5 Prokaryotic Gene Prediction Using GeneMark and GeneMark.hmm • 4.6 Eukaryotic Gene Prediction Using GeneMark.hmm • 4.7 Application of FirstEF to Find Promoters and First Exons in the Human Genome • 4.8 Using TWINSCAN to Predict Gene Structures in Genomic DNA Sequences • 4.9 GrailEXP and Genome Analysis Pipeline for Genome Annotation • 4.10 Using RepeatMasker to Identify Repetitive Elements in Genomic Sequences D Dobbs ISU - BCB 444/544X: Promoter Prediction
New Today: Promoter Prediction • A few more words about Gene prediction • Predicting regulatory regions (focus on promoters) • Brief review promoters & enhancers • Predicting in eukaryotes vs prokaryotes • Introduction to RNA • Structure & function D Dobbs ISU - BCB 444/544X: Promoter Prediction
Predicting Promoters What signals are there? Algorithms Promoter prediction software D Dobbs ISU - BCB 444/544X: Promoter Prediction
What signals are there? Simple ones in prokaryotes Brown Fig 9.17 D Dobbs ISU - BCB 444/544X: Promoter Prediction BIOS Scientific Publishers Ltd, 1999
Prokaryotic promoters • RNA polymerase complex recognizes promoter sequences located very close to & on 5’ side (“upstream”) of initiation site • RNA polymerase complexbinds directly to these. with no requirement for “transcription factors” • Prokaryotic promoter sequences are highly conserved • -10 region • -35 region D Dobbs ISU - BCB 444/544X: Promoter Prediction
What signals are there? Complex ones in eukaryotes! Fig 9.13 Mount 2004 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Simpler view of complex promoters in eukaryotes: Fig 5.12 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Eukaryotic genes are transcribed by 3 different RNA polymerases Recognize different types of promoters & enhancers: Brown Fig 9.18 D Dobbs ISU - BCB 444/544X: Promoter Prediction BIOS Scientific Publishers Ltd, 1999
Eukaryotic promoters & enhancers • Promoters located “relatively” close to initiation site (but can be located within gene, rather than upstream!) • Enhancers also required for regulated transcription (these control expression in specific cell types, developmental stages, in response to environment) • RNA polymerase complexes do not specifically recognize promoter sequences directly • Transcription factors bind first and serve as “landmarks” for recognition by RNA polymerase complexes D Dobbs ISU - BCB 444/544X: Promoter Prediction
Eukaryotic transcription factors • Transcription factors (TFs) are DNA binding proteins that also interact with RNA polymerase complex to activate or repress transcription • TFs contain characteristic “DNA binding motifs” http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes.table.7039 • TFs recognize specific short DNA sequence motifs “transcription factor binding sites” • Several databases for these, e.g.TRANSFAC http://www.generegulation.com/cgibin/pub/databases/transfac D Dobbs ISU - BCB 444/544X: Promoter Prediction
Zinc finger-containing transcription factors • Common in eukaryotic proteins • Estimated 1% of mammalian genes encode zinc-finger proteins • In C. elegans, there are 500! • Can be used as highly specific DNA binding modules • Potentially valuable tools for directed genome modification (esp. in plants) & human gene therapy Brown Fig 9.12 BIOS Scientific Publishers Ltd, 1999 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Global alignment of human & mouse obese gene promoters (200 bp upstream from TSS) Fig 5.14 Baxevanis & Ouellette 2005 D Dobbs ISU - BCB 444/544X: Promoter Prediction
Reading Assignment (for Wed) • Mount Bioinformatics • Chp 8 Prediction of RNA Secondary Structure • pp. pp. 327-355 • Ck Errata:http://www.bioinformaticsonline.org/help/errata2.html • Cates (Online) RNA Secondary Structure Prediction Module • http://cnx.rice.edu/content/m11065/latest/ D Dobbs ISU - BCB 444/544X: Promoter Prediction