1 / 35

Bioinformatics

Bioinformatics. Outline of Presentation Overview of Bioinformatics Introduce methods to access HGP. Goals of HGP. Sequence Human Genome Sequence Genomes from other Species Prokaryotes: Eubacteria & Archaebacteria Eukaryotes: Yeast-Nematode-Fruit Fly-Plant-Humans.

nalani
Télécharger la présentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Outline of Presentation Overview of Bioinformatics Introduce methods to access HGP

  2. Goals of HGP • Sequence Human Genome • Sequence Genomes from other Species • Prokaryotes: Eubacteria & Archaebacteria • Eukaryotes: Yeast-Nematode-Fruit Fly-Plant-Humans

  3. Bioinformatics = Computational Genomics • New discipline to utilize information from HGP • Name is not determined • Represents the third major shift in Biology • 1st Systematics; indexing and naming species • 2nd Molecular Biology (1953); understanding DNA/RNA/Protein

  4. -nomics is the “new suffix” • Genomics • Proteonomics • Pharmacogenomics • Subpopuations that benefit from a drug • Toxicogenomics • Subpopulations that have adverse reactions to a drug • Vaccinomics • Derive epitopes of virulent antigens from MHC Ag processing

  5. & the Latest • A new speciality: ecogenomics • “Horizontal gene transfer is not a fundamental force in microbial evolution, but it is a fundamental force in the evolution of particular loci” • John Paul, marine ecologist • University of South Florida.

  6. Goal of Bioinformatics • Interpret the language (grammar) of DNA • DNA to RNA to Protein • Similarity, Alignment, & Homology • Predict Protein Structure/Function from DNA • Homology Structures of Proteins (tree of life-evolution)

  7. Strategies • Search for DNA matches • 4 bases (ATCG); code degenerate • multiple codes per amino acids • Search for Protein (amino acids) • 20 amino acids; less flexibility • Compare Proteins or DNA to each other • Start with DNA or Proteins, then convert

  8. Homology Studies • Proteins that share: • significant sequences • functional groups (families-Prosite database) • Have Common Ancestor • Only part of protein important • Signature, Families, Motifs, Conserved Sequence

  9. Pearson, WR., Protein sequence comparison and Protein evolution. 11/99

  10. Major Databases • NCBI: National Center for Biotechnology: • BLAST, OMIM, PUBMED, TAXONOMY, STRUCTURES • PDB: Protein Data Base at Brookhaven National Laboratories • 3-d visualization of protein structures • ExPASy: Swiss Institute of Bioinformatics • Protein and Enzyme function and homology • KEGG: Kyoto Encyclopedia of Genes and Genomes • Metabolic maps and functions of enzymes • TIGR: The Institute of Genetic Research Microbial Database • Microbial gene maps

  11. National Center for Biotechnology (NCBI) • www.ncbi.nlm.nih.gov • BLAST (Basic Local Alignment Search Tool); sequence similarities • Entrez; Nucleotide or protein retrieval • OMIM (Online Mendelian Inheritance in Man); information on genetic disorders • Mutations, DNA, Protein, & other links • PubMed; 9 million citations in MEDLINE • Taxonomy; names of all organisms that have >1 reported DNA base • Structures; 3-d structures

  12. NCBI Menu

  13. BLAST Menu • 5 types of programs; 3 DNA/RNA queries, 2 amino acid queries • Databases-20 (general or specialized) • Filtered; leave checked (remove ALU sequences or highly repeated DNA) • Fasta Format; 1st line starts with >text, next line is bases (or 1-letter amino acids), no commas or periods, <80 characters per line. • Web or e-mail queries • Cruncher and Muncher: name of computers

  14. BLAST Input Menu

  15. Nucleotide Sequence Queries • BLASTN: nucleotide to nucleotide • BLASTX: nucleotide to protein • TBLASTX: 6-frame translation to protein • uses a lot of CPU time

  16. FASTA Format for Nucleic Acids • A --> adenosine M --> A C (amino) • C --> cytidine S --> G C (strong) • G --> guanine W --> A T (weak) • T --> thymidine B --> G T C • U --> uridine D --> G A T • R --> G A (purine) H --> A C T • Y --> T C (pyrimidine) V --> G C A • K --> G T (keto) N --> A G C T • - gap of indeterminate length

  17. Protein Sequence Queries • BLASTP: protein to protein • TBLASTN: protein to 6 frame translation • requires extensive CPU time • Goal: similarity, alignment, homology

  18. FASTA Format for Proteins • A alanine P proline • B aspartate or asparagine Q glutamine • C cystine R arginine • D aspartate S serine • E glutamate T threonine • F phenylalanine U selenocysteine • G glycine V valine • H histidine W tryptophan • I isoleucine Y tyrosine • K lysine Z glutamate or glutamine • L leucine X any • M methionine * translation stop • N asparagine - gap of indeterminate length

  19. BLAST Databases • 20 different databases to search • NR (non-redundant), MITO (mitochondria), MONTH (new records), YEAST, E. coli, ALU, VECTOR, etc. • NR: Nucleotide or Protein -non-redundant from all known data sources • default on BLAST

  20. BLAST Output Graphical overlay of matched sequences Genome sequence link (gb, emb, jp) Name of sequence Score (computational score of hits) Statistical Probability (p <0.05 for significance) see problems for example of output

  21. Random Sequence

  22. BLAST Output Part 1. of 5. Click for results

  23. BLAST Output Part 2. of 5

  24. BLAST Output Part 3. of 5.

  25. BLAST Output Part 4. of 5.

  26. BLAST Output Part 5. of 5.

  27. Advanced Subjects • Substitution Matrices; allow substitutions of amino acids • Masking; remove low complexity sequences • Filters; remove redundant sequences • Types of searches • DNA to DNA • DNA to Protein (6 frame translation) • Protein to Protein • Homology studies

  28. Substitution Matrices • For alignment of 2 proteins • Scores given each amino acid substituted in the comparison. • Scores calculated by comparing distantly related proteins • Examples of Scores • Amino acids the same chemically • Glutamic Acid vs. Aspartic Acid (acid to acid) • Leucine vs. Isoleucine (-phobic to –phobic) • Large positive value (+4) • Small to large • Glycine to Tryptophan (small neutral to large –phobic) • Large negative value (-4) • Some substitutions will effect structure and would be detrimental

  29. Substitution Matrices; cont’d • BLOSUM62; default on BLAST Programs • PAM40 and other choices available • Low; detecting very strong but localized sequence similarities • High; detecting long but weak alignments between distantly related sequences • Use the Higher number for more distant relationships

  30. Partial PAM 120

  31. Masking • Low complexity regions represent locally biased amino acid composition; • sequences deviate from random model used for statistical significance. • Low complexity sequences are statistically but not biological important. • >25% or more residues in protein sequence database

  32. Filtering • Reduce number of matches due to highly repetitive sequences • Low Complexity sequences • Poly A, ALU sequences (~10% of genome is ALU) • Don’t want these sequences reported; all report will be • Default in BLAST (SEG & XNU)

  33. 3-D Visualizations of Proteins • Software allows manipulation of structures; all freeware • pluggins for Browsers (Netscape and Explorer, need most current versions) • Backbone, wireframe, spacefilling, stereo, rotation, zoom • CHIME; best, tutorials written for biochemistry, organic chemistry, & inorganic chemistry • http://www.mdli.com/download/ • 3NCD; published by NIH-NCBI; allows overly of proteins • http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.shtml

  34. Reverb-Dna Binding Complex (1aby.pdb) Wireframe Cartoon Spacefilling

More Related