1 / 39

Bioinformatics

Bioinformatics. The application of computational techniques to understand and organise the information associated with biological macromolecules. Aims of Bioinformatics.

jerry-dale
Télécharger la présentation

Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics The application of computational techniques to understand and organise the information associated with biological macromolecules

  2. Aims of Bioinformatics • to organise data in a way that allows researchers to access existing information and to submit new entries as they are produced • to develop tools and resources that aid in the analysis of data • to conduct global analyses of all the available data with the aim of uncovering common principles that apply across many systems and highlight novel features

  3. Aims of Bioinformatics • to organise data in a way that allows researchers to access existing information and to submit new entries as they are produced • to develop tools and resources that aid in the analysis of data • to conduct global analyses of all the available data with the aim of uncovering common principles that apply across many systems and highlight novel features

  4. Source of data • 1. DNA or Protein sequences • 2. Macromolecular structures • 3. Results of functional genomics and proteomics experiments (gene expression data)

  5. DNA or Protein sequences DNA sequences are strings of the 4 base-letters comprising genes, each tipically 1,000 bases long. The widest db contains at least 27 million entries. Protein sequences are strings of the 20 aminoacid-letters. At present more than 400,000 protein sequences are known.

  6. Size of data Biological data are being produced at a phenomenal rate At April 2001, - GenBank db of nucleic acid sequences contained 11,546,000 entries - SwissProt db of protein sequences contained 95,320 entries These databases doubled in size in 15 months

  7. Size of data Anthony Kervelage of Celera recently cited that an experimental laboratory can produce over 100 gigabytes of data per day with ease.

  8. Biological processing power This incredible processing power has been matched by developments in computer technology

  9. Areas of improvements • CPU (faster computations) • disk storage (better data storage) • Internet (revolutionalised the methods for accessing and exchanging data)

  10. Source of data • 1. DNA or Protein sequences • 2. Macromolecular structures • 3. Results of functional genomics and proteomics experiments (gene expression data)

  11. Macromolecular structure There are currently 15,000 entries in the Protein Data Bank, PDB The PDB db contains atomic structures (xyz-coordinates) of proteins, DNA and RNA solved by x-ray crystallography and NMR A typical PDB file contains the xyz-coordinates of ca. 2000 atoms.

  12. Source of data • 1. DNA or Protein sequences • 2. Macromolecular structures • 3. Results of functional genomics and proteomics experiments (gene expression data)

  13. Gene expression data These experiments measure the amount of mRNA (functional genomics) or protein (proteomics) that is produced by the cell under different conditions, different stages of the cell cycle and different cell types in multi-cellular organisms. One of the largest dataset available has made approximately 20 time-point measurements for 6,000 genes (yeast).

  14. Gene expression data On a experimental point of view, it is possible to determine the expression levels of almost every gene in a given cell on a whole-genome level. However there is currently no central depository for these data and public availability is limited.

  15. Biological data The diversity in the size and complexity of different datasets. Although macromolecular structures and gene expression experiments are giving much more biological information than the raw sequence data, there are invariably more sequence-based data than others. Why?

  16. Why? • Because of the relative ease with which they can be produced

  17. Why? • Because they can be easily managed by both biologists and by computer scientists also with very low biological background

  18. Gene expression data • On the other hand, gene expression data are far more complex to be managed and: • 1. biologists rarely achieve mathematical competence beyond elementary calculus and maybe a few statistical formulae. • 2. although everybody uses a computer, biologists rarely use anything but standard commercial software

  19. Gene expression data • Gene expression data are far more complex to be managed and: • 3. people with non-biological background can find surprisingly difficult to master the complex and apparently unconnected information that is the working knowledge of every biologist

  20. Source of data • 1. DNA or Protein sequences • 2. Macromolecular structures • 3. Results of functional genomics and proteomics experiments (gene expression data)

  21. Source of data 4. Genomic-scale data include biochemical information on metabolic pathways, regulatory networks, protein-protein interactions and data from two-hybrid experiments and systematic knockouts of individual genes

  22. Integration Integration of multiple sources of data. At a basic level, this problem is frequently addressed by providing external links to other databases. At a more advanced level, an integrated access across several data sources is provided.

  23. Data organisation First biological databases were simple flat files. At the moment most of them are relational db with Web-page interfaces.

  24. Aims of Bioinformatics • to organise data in a way that allows researchers to access existing information and to submit new entries as they are produced • to develop tools and resources that aid in the analysis of data • to conduct global analyses of all the available data with the aim of uncovering common principles that apply across many systems and highlight novel features

  25. Data and Software Tools

  26. Data and Software Tools • For example: • software for gene finding (identification of coding regions) • software for similarity searches • multiple sequence alignments and searching for functional domains • homology modeling • calculations of surface and volume shapes and analysis of protein interactions with DNA, RNA, other proteins or drugs (chemoinformatics)

  27. Similarity searching Having sequenced a particular protein, it is of interest to compare it with previously characterised sequences. This need more than just simple text-based search, and these programs must consider what constitutes a biologically significant match. Biologically significant match: - two sequences share a common function - two sequences share a common evolutionary history (homologs)

  28. Homology modeling At a structural level, it is predicted to be a finite number of different tertiary structures - estimates range between 1,000 and 10,000 folds. A structure can be predicted on a homology-based manner, by comparison with known structures (3-D structural alignments) Although the number of structures in the PDB db has increased exponentially, the rate of discovery of novel folds has actually decreased.

  29. Ab initio structure prediction Prediction of the 3-D structure is based on the protein sequence only: e.g. the propensity of certain aminoacid combinations to produce secondary structural elements.

  30. Aims of Bioinformatics • to organise data in a way that allows researchers to access existing information and to submit new entries as they are produced • to develop tools and resources that aid in the analysis of data • to conduct global analyses of all the available data with the aim of uncovering common principles that apply across many systems and highlight novel features

  31. Data exploration Finding relationships between different proteins: - Analysis of one type of data to infer and understand the observations for another type of data - Comparative analysis to do classification Expansion of biological analysis in two dimensions, depht and breadth

  32. Expansion of biological analysis in two dimensions: depht Example: Rational drug design This approach takes a single gene and follow through ana anlysis that maximises our understaning of the protein it encodes. Then prediction algorithms can be used to calculate the structure and to make hypothesis on its function Geometry calculations can define the shape of the protein’s surface and identify or design ligands that can become drugs specifically altering the protein’s function.

  33. Expansion of biological analysis in two dimensions: breadth Example: comparison of a gene or a gene product with others. This approach can lead to extract sequence patterns or structural templates that define a family of proteins sharing a common property. This approach can also lead to construct phylogenetic trees to trace evolutions. E.g. the SARS virus.

  34. Data organisation First biological databases were simple flat files. At the moment most of them are relational db with Web-page interfaces.

  35. Sequence analysis Techniques include mainly string comparison methods

  36. Motif and pattern identification and classification • depend on: • Machine learning • Clustering and data mining techniques

  37. 3-D structural analysis • include: • Euclidean geometry calculations • Basic application of physical chemistry • Graphical representation of surface and volumes • Structural comparison (3-D matching)

  38. Bio Informatics This unexpected union between the two subjects is attributed to te fact that life itself is an information technology Un organism’s physiology is largely determined by its genes, which at its most basic can be viewed as digital information

More Related