1 / 26

Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK

Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16. Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK. Introduction: Story of Evolutionary History.

demont
Télécharger la présentation

Kyle Tretina w ith a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analysis of the Positively Selected and Non-Positively Selected Non-Protein Coding Sequences of Chromosome 16 Kyle Tretina with a team led by Dr. Pattle P. Pun in collaboration with Mr. Ross Leung of CUHK

  2. Introduction: Story of Evolutionary History • Story: increasing organismal complexity as evolution proceeds Bacteria < Fish < Primate < Human

  3. WHY? • “But little Mouse, you are not alone, In proving foresight may be in vain: The best laid schemes of mice and men Go often askew, And leave us nothing but grief and pain, For promised joy!” –Robert Burns (1785)

  4. Genetics • Central Dogma: DNA  RNA  Protein • Complexity ~ Number of Genes? • Humans ~30,000 • Flies ~ 14,000

  5. G-Value Paradox

  6. Complexity (K) ~ Gene Number (N)? • Relationship? • proportional: K ~ N • polynomial: K ~ Na • exponential: K ~ aN • factorial: K ~ N! • Jean-Michel Claveries: ON/OFF states • 230,000 / 214,000 ≈ 3x104816

  7. Goal • Determine the role of non-coding DNA in gene regulation by looking at the functions of non-coding SNPs that are positively selected or non-positively selected on chromosome 16

  8. Definitions • SNP: single nucleotide polymorphism • Variable between populations • Importance likely due to stability of variation • Selection: description of phenomena that only organisms best adapted to their environment tend to survive and create progeny • Gene-selection algorithm and neutral selection theory (wrench)

  9. Methods Overview • HapMap Database Selection Data  List of Chr16 SNPs • UCSC Genome Database Mirror  SNP flanking sequence • TRANSFAC  related transcription factor data for each SNP flanking sequence • PReMod confirm results

  10. HapMap Phase I Data • HapMap Project: an international effort to identify and catalog genetic similarities and differences in human beings (Haplotype Maps), also includes: • Selection Data  List of Chr16 SNPs • ~25,000 non-positively selected • ~5,000 positively selected

  11. UCSC Genome Browser • Genome.UCSC.edu: a website containing several reference sequences and tools for visual and computational analysis • Methods: • Enter in each from list of RSID’s (SNP Identifiers) • Note intersecting sequences • Copy/Paste Sequences

  12. UCSC Genome Browser Mirror • Efficiency • ~70seq/hr for 1.5yrs = ~1/3 sequences gathered • 2hrs • Online Instructions, but Complicated Data Structure • Henry Ford: 1.1 million lines source code • Many thanks to the Dr. Hayward (Wheaton College CS Faculty)

  13. Sequences Collected • Graph 1. The distributions of the positively selected SNPs used in the study across human chromosome 16 • Graph 2. The distributions of the non-positively selected SNPs used in the study across human chromosome 16

  14. TRANSFAC • TRANSFAC: a relational database, available via the web as six flat files including various data concerning transcription factors, DNA-binding sites, and target genes • Automation at CUHK

  15. PReMod • PReMod: a new database of genome-wide cis-regulatory module (CRM) predictions for both the human and the mouse genomes.  • Enter ranges for SNP sequences • Look for same pattern as TRANSFAC

  16. Analysis • MySQL Tables • Programmed Scripts: • Word Patterns: i.e. keywords, recurring identifiers • Unique Entries • Progress Statistics • Overlap between N+ selected and + selected SNPs

  17. Results Table 1. A summary of the manual SNP flanking sequence gathering from the UCSC Genome Browser

  18. Results

  19. Conclusions • Data not all in yet • Possible implications: • Central Dogma Biology: information flow • Quantification Genetic Natural Selection • Views of Complexity of Humans • Lesson Learned: value of bioinformatics • High volume data requires computational analysis, not manual

  20. Acknowledgements • Many thanks to Dr. Pun, for letting me get involved in this project, for his vision and mentorship. • Special thanks to Dr. Hayward, for putting in extra hours unpaid so that a student can follow his dreams of graduate school. • Thanks to our collaborators at the Chinese University of Honk Kong – Dr. Tsui and Mr. Leung – for accessing the TRANSFAC database for us, and for being flexible to the demands of our project. • The most thanks to God, for blessing me with the opportunity to work hard and learn. I pray that I might always be able to do these two things earnestly and voraciously.

More Related