1 / 21

PROMoter SCanning/ANalysis tool

PROMoter SCanning/ANalysis tool. Goal. Creating a tool to analyse a set of putative promoter sequences and recognize known and unknown promoters, with built-in scoring system. Sequences to be PromScAnned. Sequences from Sergei Denissov, Molecular Biology (NCMLS)

Télécharger la présentation

PROMoter SCanning/ANalysis tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PROMoter SCanning/ANalysis tool

  2. Goal Creating a tool to analyse a set of putative promoter sequences and recognize known and unknown promoters, with built-in scoring system

  3. Sequences to be PromScAnned • Sequences from Sergei Denissov, Molecular Biology (NCMLS) • Obtained from the cloning of chromatin (U2-OS human cells) highly enriched through double immunoprecipitation with anti-TBP antibodies

  4. Main database: BLAT • BLAT: BLAST-Like Alignment Tool • Aligns the input sequence to the Human Genome • Connected to several databases, like: • mRNAs - GenScan • ESTs - TwinScan • RepeatMasker - UniGene • RefSeq - CpG Islands

  5. BLAT Human Genome Browser

  6. BLAT method (1) • Align sequence with BLAT, get alignment info • Per BLAT hit, pick up additional info from connected databases: • mRNAs • ESTs • RepeatMasker • CpG Islands • RefSeq Genes

  7. BLAT method (2) • Additional info is gathered for four different positions: • 1kb to the left + query itself • 1kb to the right + query itself • 20kb to the left + query itself • 20kb to the right + query itself (1 kb and 20kb can be adjusted through interface) (close promoters) (distant promoters)

  8. mRNAs Genbank human mRNAs are aligned against the genome using the BLAT program. When a single mRNA aligns in multiple places, the alignment having the highest base identity is found. Only alignments that have a base identity level within 1% of the best are kept. Alignments must also have at least 95% base identity to be kept.

  9. ESTs • This track shows alignments between human Expressed Sequence Tags (ESTs) in Genbank and the genome. • Expressed sequence tags are single read (typically approximately 500 base) sequences which usually represent fragments of transcribed genes. Aligning regions (usually exons) are shown as black boxes connected by lines for gaps (usually spliced out introns).

  10. RepeatMasker • Created by Arian Smit's Repeat Masker program which uses the RepBase library of repeats from the Genetic Information Research Institute • RepBase is a database of repetitive DNA sequence elements found in a variety of eukaryotic organisms including mammals, fish, insects, nematodes, and plants. • Different Repeats: SINE, LINE, LTR, DNA, Simple, Low Complexity, Satellite, tRNA, other

  11. CpG Islands • CpG = C+G; C immediately followed by G • Particularly common near transcription start sites, and may be associated with promoter regions • Normally, in vertebrates: CG -> C is methylated -> methylated C is deaminated -> TG • CpG’s are relatively rare, unless there is a selective pressure to keep them, or: • a region is not methylated for some reason, perhaps having to do with the regulation of gene expression. • CpG islands are regions where CpG's are present at significantly higher levels than is typical for the genome as a whole.

  12. RefSeq Genes • The RefSeq Genes track shows known protein coding genes taken from mRNA reference sequences compiled at LocusLink. • Refseq mRNAs are aligned against the genome using the BLAT program. When a single mRNA aligns in multiple places only the best alignments are kept. The alignments must also have at least 98% sequence identity to be kept.

  13. Scoring Method (1) For each BLAT hit the Score is: Σ (length(mRNA)/distance(mRNA))*sw + Σ (length(EST)/distance(EST))*sw + Σ (length(RMSK tRNA)/distance(RMSK tRNA))*sw + Σ (length(RMSK LTR)/distance(RMSK LTR))*sw + Σ (length(RMSK rest)/distance(RMSK rest))*sw + Σ (length(CpG)/distance(CpG))*sw + Σ (length(RefSeq Genes)/distance(RefSeq Genes))*sw (sw = scoring weight)

  14. Scoring Method (2) • Scoring weight: reflects reliability of the analyzed data; how much proof for being promoter? • Adjustable through interface; defaults: • mRNAs: 4 • ESTs: 3 • RepeatMasker tRNA: 3 • RepeatMasker LTR: 2 • RepeatMasker rest: 1 • CpG Islands: 2 • RefSeq Genes: 0

  15. DBTSS (1) Additional info from DBTSS: DataBase of Transcriptional Start Sites • Most cDNAs lack precies information of 5’ termini. • Oligo-capping method -> full-length cDNAs. • Of about 284,687 5' end sequences obtained, 155,304 have been corresponded to cDNA sequences of known genes (8,996 genes) and are presented in the DBTSS

  16. DBTSS (2) • Mapped each sequence on the human draft genome sequence to identify its transcriptional start site • Overall Score: BLAT Score * DBTSS Score

  17. PromScan Query Interface http://www.cmbi.kun.nl/~timhulse/promscan

  18. Output (1): Header Excel; also plain text format (tab separated) possible

  19. Output (2): Sequence Report

  20. Output (3): Overall Report Multiple hits are sorted from high score to low score; the higher the score, the higher the possibility the input sequence is a promoter.

  21. Suggestions please!

More Related