1 / 10

Work Presentation Novel RNA genes in A. thaliana

Work Presentation Novel RNA genes in A. thaliana. Gaurav Moghe Oct, 2008-Nov, 2008. Source: Nature (Commentary on ENCODE. Starting databases. P utative U nique T ranscripts (PUTs) E xpressed S equence T ags (ESTs). ESTs vs PUTs.

santos
Télécharger la présentation

Work Presentation Novel RNA genes in A. thaliana

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008

  2. Source: Nature (Commentary on ENCODE

  3. Starting databases • Putative Unique Transcripts (PUTs) • Expressed Sequence Tags (ESTs)

  4. ESTs vs PUTs • 42% of the total EST sequences in GenBank assembled into PUTs • 82% of the ESTs can be mapped to a unique genomic region vs 72% of the PUTs

  5. Download PUT sequences ~324,000 Map them to the genome using GMAP 236,011 551 Yes? Map to AT RNA genes Map to protein-coding regions 3630 No? Map to other AT features BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences 2023 No match? BLASTx against all known proteins to verify absence of any protein in the sequences 1849 No match? 1739 BLASTn against Repetitive Sequence Database 1453 No match? Coding Index to double-verify absence of protein-like seq 1260

  6. Download PUT sequences ~324,000 Map them to the genome using GMAP 236,011 551 Yes? Map to AT RNA genes Map to protein-coding regions 3630 No? Map to other AT features BLASTn against all known CDS sequences + GeneWise to confirm alignment on translated CDS sequences 2023 No match? BLASTx against all known proteins to verify absence of any protein in the sequences 1849 No match? 1739 BLASTn against Repetitive Sequence Database 1453 No match? Coding Index to double-verify absence of protein-like seq 1260

  7. Issues • PUT sequences of not very good quality Use sequence of the region on the genome where these PUTs map Use EST sequences? • BLAST against database does not give all hits BLAST against a different database, of a different size. • PUTs extremely close to genes may be part of extended UTR regions Remove ridiculously close ones. Check directions of other PUTs.

  8. What if… • A sequence passes through all filters… but still is a protein sequence?

  9. Issues • Most of these PUTs do not show conservation Does that mean they are non-functional? • Most of these PUTs do not seem to have a secondary structure like RNA Does that mean they are not RNA genes?

  10. Plans for the next month • Get the final list of novel PUTs • Assign them directionality and estimate assembly error rates using EST mapping • Conservation • Secondary structure

More Related