1 / 26

Sequence Analysis

Sequence Analysis. Today. How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to determine what is known about this sequence biologically?. Gene structure. Genes contain introns and exons.

blythe
Télécharger la présentation

Sequence Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sequence Analysis

  2. Today • How to retrieve a DNA sequence? • How to search for other related DNA sequences? • How to search for its protein sequence? • How to determine what is known about this sequence biologically?

  3. Gene structure • Genes contain introns and exons. • Introns are transcribed into RNA but are removed, ie. the are non-coding regions. • Exons are the coding regions. Present in mRNA. Intron1 I2 Exon1 E2 E3 mRNA

  4. Types of DNA sequence • Genomic • Contains both genes and non-genic regions • Genes have both intron and exons • cDNA (complimentary DNA) • Sequence corresponds to genes that are expressed. • Sequence contain only the

  5. What could you do with genomic sequence? • What about with cDNA sequence?

  6. What is an EST? • Expressed sequence tag. • Part or all of a cDNA that has been sequenced.

  7. What is NCBI? • National Center for Biotechnology • National Library of Medicine, NIH • Created in 1988 to develop information systems for molecular biology. • Provides data retrieval systems and computational resources.

  8. Database Resources • Database retrieval tools • BLAST family of sequence-similarity search programs. • Resources for gene-level sequences • Resources for genome-scale analysis

  9. Database Retrieval Tools • Entrez-for DNA and protein sequences • PubMed Central-for literature • Taxonomy-organisms and associated sequences • LocusLinks-provides links from sequence info to map and other information.

  10. BLAST family • Basic local alignment search tool • Sequence similarity search against various databases in GenBank

  11. BLAST • Pairwise alignment. • Each alignment has a statistical significance (e-value). • Accounts for amino acid sequence • Outputs a list of matches including start, stop, score, and e-value.

  12. 5 BLAST Programs • BLASTN – Nucleotide vs. Nucleotide • BLASTP – Protein vs. Protein • BLASTX – Protein vs. nucleotide translation • TBLASTN – Nucleotide translation vs. Protein • TBLASTX – Nucleotide translation vs. nucleotide translation.

  13. Genome-Scale Analysis • Entrez Genomes – taxonomic, genome or chromosome view of the current sequence data for an organism. • COGs – List of orthologous protein groups from completely sequenced organisms. • Retroviroal genotyping tools – Important in viral genetic diversity, tracking outbreaks, and vaccine development.

  14. Genome-Scale Analysis • Eukaryotic Genomic Resources – location of Plant Genomes Central with information from various plant genome projects. • Map Viewer – Displays genome assemblies using chromosome map views.

  15. Genome-Scale Analysis • Human-Mouse Homology Maps – List of genes in homologous segments. • Cancer Chromosome Aberration Project – List of recurrent chromosome aberrations associated with cancer.

  16. Gene Expression/Phenotype • OMIM – Catalog of human genes and genetic disorders including phenotypes and polymorphism information. • Gene Expression Omnibus (GEO) – Data repository and retrieval system for expression data from all sources.

  17. MMDB, CDDB, CDART • Molecular Modeling Database • Conserved Domain Database • Conserved Domain Architecture Retrieval Tool – Identifies conserved domains and displays their structure.

  18. Sequence Analysis References • Korf, Yandell, and Bedell. 2003. An Essential Guide to the Basic Local Alignment Search Tool: BLAST. O’Reilly & Associates, Sebastopol, CA. • Markel and Leon. 2003. Sequence Analysis in a Nutshell: A Guide to Common Tools and Databases. O’Reilly & Associates, Sebastopol, CA.

  19. Sequence Analysis References • Baxevanis and Ouellette. 2001. Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley Interscience, New York. • Mount. 2000. Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory, New York.

  20. What can you do with the sequence? • Gene prediction • Motif identification • Promoter identification • Survey gene expression across tissues • Full length gene isolation • Identify mutations (SNP, InDel)

  21. InDel • Insertion/Deletions • Usually small sized • Can use the same protocols and equipment as for SSR analysis or can run separation on a capillary system using fluorecently labelly material.

  22. Single Nucleotide Polymorphism • SNP • Single base-pair change in the DNA sequence of two alleles. • Best done with high quality sequence and confirmed in multiple lines or multiple experiments.

  23. SNP popularity • Difficult to identify human disease loci by other methods. • Most abundant class of polymorphisms in many species. • Ease of use for genotyping, ie. they can be automated easily.

  24. What can you do with ESTs? • Gene expression analysis • Colinearity studies • Protein prediction • SNP identification • Genetic mapping

  25. Today • How to retrieve a DNA sequence? • How to search for other related DNA sequences? • How to search for its protein sequence? • How to determine what is known about this sequence biologically?

  26. Using adh as an example • Find adh1 sequence in corn. • Find related sequences. • Determine its function in corn. • Find adh in human. • Find related sequences. • Determine its function in human.

More Related