220 likes | 488 Vues
Lab 1: Bioinformatics and Genomics Internet Resources. Summary Definition of bioinformatics as many computational tools that help us understand biological data—especially the structure and function of genomic data see also much more definitive discussion at: http://bioinformaticsweb.net/
E N D
Lab 1: Bioinformatics and Genomics Internet Resources • Summary • Definition of bioinformatics as many computational tools that help us understand biological data—especially the structure and function of genomic data see also much more definitive discussion at: http://bioinformaticsweb.net/ • Guided tour of NCBI- a major integrated repository of sequence data and analysis results and tools • Learn to use NCBI through Exercise 1.2 and homework assignment
What is bioinformatics? • Integrated use of computers and databases to store, analyze and interpret biological data, especially high throughput, large dimensional data. • Primarily thought of in the context of molecular genetic data (DNA sequence, mapping, expression, etc.). • [Is this really new (revolutionary) or just an evolutionary event in data processing?] Breeders have used computation on large biological datasets for >50 years…..
The amount of data has changed! We need to change how we view and communicate complex data- to reduce it to human brain-size chunks but without distorting the message
Types of high throughput biological data • Systematic genome sequence and the annotation thereof • RNA and Protein Expression Data • Protein-protein interaction Data • Metabolomicdata • Scientific literature-mining software output
Types of high throughput biological data • “simple” data and analysis: Genome Sequences Protein domain information • Both are static and Non-context dependent • “more complex” datasets and analysis: Genome Functional Annotation Expression patterns of Protein and RNA Protein Function and Activity Physiological function at molecular/cellular/tissue levels Phenotype • These patterns are context-dependent and may change • They are also inter-dependent- interactions possible
Databanks in Molecular Biology and Genomics • Many specialized databases; many are accessible from genome browsers • The big three genome browsers NCBI, ENSEMBL, UCSC • ENSEMBL browser: http://www.ensembl.org/index.html • UCSC genome browser: http://genome.ucsc.edu; click on “genome browser”
Databanks in Molecular Biology and Genomics • NCBI Tour http://www.ncbi.nlm.nih.gov/ Site map Overview-Databases/Tools-Science Primer-Human Genome Entrez* data model-Data Submission- Education- Databases/Tools Nucleotide-Protein Gene-Homologene-OMIM Human Genome – Map Viewer Blast tools (next week)
NCBI Tour: Overview Databases and Tools -> Literature DB -> PubMed or -> OMIM -> search for huntingdons Databases and Tools -> Tools for Data mining -> Entrez -> 2 searches: a. Growth hormone b. GH1
NCBI Tour: Overview Human Genome Res. -> Guide to Online Resources -> Browse your genome -> chromosome 21 -> chromosome 11
GenBank Flat file Genbank (http://www.ncbi.nlm.nih.gov/genbank/) is the databank that holds most of the primary sequence data- presented as flat file Flat file contents Locus line (accession number, length, type, sub-directory, release date) Definition Line Accession number Keywords Source References to sequence Features Table Source Gene CDS (coding sequence) Miscellaneous features (sig_peptide, polyA- signal, polymorphism) Base Count Sequence XM_004915 is gone now. Try NM_214163
Ensembl Gene View http://useast.ensembl.org/Homo_sapiens/Info/Index
Additional Bioinformatics tool overviews:UCSC genome browser UCSC: http://genome.ucsc.edu/ Emphasis is on genome tracks of data- at any locus or larger region, you can look at different sets of data to compare gene predictions, other features. - I use UCSC only for a quick look at genes and conservation of their flanking DNA sequence among species Examples: IL1B, IGF1, HPRT or HOXA5 - UCSC also has integration with ENCODE data, which we will discuss later in functional genomics
What are you interested in? Finding available information about: A gene's function? A gene's structure? A gene's location? A gene's expression pattern? Published papers on the gene? Similar genes (and evolutionary relationships) in other species? Context of the gene in the genome- “neighborhood”
Examples of bioinformatic analyses Starting material: Accession number: From publication on gene of interest, use Genbank nucleotide or Entrez search Gene or protein name: Text search of entire NCBI website at Entrez (covers Genbank, PubMed, OMIM, Unigene, etc) Disease (human): Text search of OMIM Sequence data: Begin with sequence of your clone, do comparison of your sequence across available sequence information at NCBI website (BLAST (Genbank nr/nt, human genome+transcript) to identify the sequence if possible. If already cloned and sequenced, can determine the human and mouse location (Gene, UniGene, MapViewer, etc). Also can find out what is known about gene (OMIM, PubMed) Protein structure: Begin with link to structure in PDB (see tutorial): http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/pdb.shtml
Exercise on use of Genome Browsers Tutorial on OMIM: http://www.ornl.gov/sci/techresources/Human_Genome/posters/chromosome/omim.shtml
Exercise on use of Genome Browsers (5 pts)Your assignment:Due Wednesday August 31 before class starts1. Select a disease and gene by end of today2. Give disease/gene information to Dr. Tuggle3. Research this gene using the resources as described above.4. Answer the questions in the Exercise; send answer by email to Dr. Tuggle.- You are allowed to cut and paste text from the websites, but clearly indicate the question and the answer. - A minimal effort will receive minimal points. Make sure I know you visited these sites and that you learned something - You don’t have to use all three browsers and you don’t have to compare them (delete question f)