1 / 22

Bioinformatics Toolbox

Bioinformatics Toolbox. Yaohang Li Department of Computer Science North Carolina A&T State University. Bioinformatics Toolbox. Extends MATLAB to provide an integrated software environment Genome Analysis Proteome Analysis Applications Drug Discovery Genetic Engineering

spickell
Télécharger la présentation

Bioinformatics Toolbox

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics Toolbox Yaohang Li Department of Computer Science North Carolina A&T State University

  2. Bioinformatics Toolbox • Extends MATLAB to provide an integrated software environment • Genome Analysis • Proteome Analysis • Applications • Drug Discovery • Genetic Engineering • Biological Research

  3. Functionalities of Bioinformatics Toolbox • Data Analysis Functions • Connecting to Web accessible databases • Reading and converting between multiple data formats • Determining statistical characteristics of data • Manipulating and aligning sequences • Modeling patterns in biological sequences using Hidden Markov Model (HMM) profiles • Reading, normalizing, and visualizing microarray data • Creating and manipulating phylogenetic tree data • Interfacing with other bioinformatic software

  4. Functionalities of Bioinformatics Toolbox • Prototype and Develop Algorithms • Visualize Data • Sequence alignments • Gene expression data • Phylogenetic trees • Protein structure analysis • Share and Deploy Applications • Create stand-alone applications • GUI interface

  5. Installation • Required Software • MATLAB • Statistics Toolbox • Additional Software • Signal Processing Toolbox • Image Processing Toolbox • Optimization Toolbox • Neural Network Toolbox • Database Toolbox • MATLAB Compiler

  6. Data Formats and Databases • Web-based databases • GenBank (getgenbank) • GenPept (getgenpept) • European Molecular Biology Laboratory EMBL (getembl) • Protein Sequence Database PIR-PSD (getpir) • Protein Data Bank PDB (getpdb) • Raw Data • Read data generated from gene sequencing instruments • Reading/Writing Data Formats • Sequence data • Multiply Aligned Sequences • Gene Expression Data from Microarrays

  7. Sequence Analysis • Sequence Analysis • Find information about a nucleotide or amino acid sequence • Using computational methods • Tasks • Identify genes • Determine the similarity of two genes • Determine the protein coded by a gene • Determine the function of a gene by finding a similar gene in another organism with a known function • Example • Sequence Statistics • Sequence Alignment

  8. Sequence Statistics • Task • Starting with a DNA sequence, calculate statistics for the nucleotide content • Example: Determining Nucleotide Content • Task • Studying the human mitochondrial genome • While many genes that code for mitochondrial proteins are found in the cell nucleus, the mitochondrial has genes that code for proteins used to produce energy • Procedure • Find the nucleotide sequence for the genome • Look at the nucleotide content for the entire sequence • Determine open reading frames and extract specific gene sequences

  9. Determining Nucleotide Content • Step 1: • Use Matlab help browser to explore NCBI website • Step 2: • Search NCBI website for information • Step 3: • Select a result page

  10. Getting Sequence Information into MATLAB • MATLAB provides an integrated environment for bringing sequence information into MATLAB • Get sequence information from a Web database • You can also load the sequence from a MAT file • Get information about the sequence

  11. Determining Nucleotide Composition • Knowledge • Sections of a DNA sequence with a high percent of A+T nucleotides usually indicates intergenic parts of the sequence • Low A+T and higher G+C nucleotide percentages indicate possible genes • High CG dinucleotide content is located before a gene • Statistics functions of bioinformatics toolbox • Determine if the sequence has the characteristics of a protein-coding region

  12. Determining Nucleotide Composition (II) • Count the nucleotide • basecount.basecount(mitochondria) • In the reverse complement of a sequence • Basecount(seqrcomplement(mitochondria)) • Show the pie chart

  13. Determining Codon Composition • Background • Trinucleotides (codon) code for an amino acid • 64 possible codons • Knowing the percentage of codons in a sequence can be helpful when comparing with tables for expected codon usage • Bioinformatics toolbox • Count condons in a nucleotide sequence • codoncount(mitochondria)

  14. Amino Acid Conversion and Composition • Determining the relative amino acid composition • Characteristic profile for the protein • Amino acid composition • Atomic composition • Molecular weight • Convert a nucleotide sequence to an amino acid sequence

  15. Amino Acid Conversion and Composition (cont.) • Count the amino acids in the protein sequence • aacount(ND2AASeq, ‘chart’, ‘bar’) • Determine the atomic composition and molecular weight of the protein

  16. Sequence Alignment • Task • Determine the similarity between two sequences • Example • Starting with a DNA sequence for a human gene, locate and verify a corresponding gene in a model organism

  17. Comparing Amino Acid Sequences • Convert the DNA sequence to Amino acid sequences • Draw a dot plot comparing human and mouse amino acid sequence

  18. Global Alignment • Align two amino acid sequences • Using Needleman-Wunsch algorithm

  19. DNA Microarray Data Analysis • DNA Microarray • A parallel snapshot of gene activities • Simultaneously measure the activity and interactions of genes • Insights into mechanisms of living systems • Scientific Tasks • Identification of coexpressed genes • Discovery of sample or gene groups with similar expression patterns • Identification of genes whose expression patterns are highly differentiating with respect to a set of discerned biological entities • Study of gene activity patterns under various stress conditions

  20. Microarray Analysis • Microarray Data • Research the function of cells • Compare the differences between healthy and diseased tissue • Observe changes with the application of drugs • Example • Visualizing Microarray Data • Analyzing Gene Expression Profiles

  21. Statistics of Microarray • Look at the distribution of data in each of the blocks

  22. Other Functions • Phylogenetic Tree Tool • Protein Structure Analysis • Data Visualization

More Related