1 / 47

Bioinformatics -

Bioinformatics -. Scope & Applications. Speaker- Joy Scaria Biological Sciences Group. What is Bioinformatics?. Complicating Biology with introducing Algorithms, scripts, statistics and confusing software's so that no one understands it any more…. Definition.

erma
Télécharger la présentation

Bioinformatics -

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics - Scope & Applications Speaker- Joy Scaria Biological Sciences Group

  2. What is Bioinformatics? Complicating Biology with introducing Algorithms, scripts, statistics and confusing software's so that no one understands it any more….

  3. Definition • The application of computer technology to the management of biological information. Specifically, it is the science of developing computer databases, algorithms and software's to facilitate and expedite biological research Note that the definition has no boundries or specific areas inside biology. There are still unexplored areas in biology where informatics could be used.

  4. Biological Information Protein 2-D gel mRNA Expression Protein 3-D Structure Mass Spec. Genome sequence The Cell

  5. Nature of Biological data • Heterogeneous • Unorganized • Voluminous • Dynamic

  6. “Towards a paradigm shift in biology” Nature News and Views 349:99 Bioinformatics impacts on all aspects of biological research. ”..We must hook our individual computers into the worldwide network that gives us access to daily changes in the databases and also makes immediate our communications with each other. The programs that display and analyze the material for us must be improved - and we must learn to use them more effectively. Like the purchased kits, they will make our life easier, but also like the kits, we must understand enough of how they work to use them effectively…” Walter Gilbert (1991) “Towards a paradigm shift in biology” Nature News and Views 349:99

  7. What a computer can do ? • Speed -> Accuracy -> Diligence • Organize the heterogeneous data • Means to store voluminous data (Biological databases) • Data analysis tools

  8. Biological databases and organization • Storage of gene and protein sequences • Storage of Protein (and other) structures • Ontology • Others (custom or specialised databases)

  9. How can biological data be organized ? • Public databases available on the internet • Data analysis tools • Web interface

  10. Genome sequencing projects, including the human genome project are producing vast amounts of information. The challenge is to use this information in a useful way COMPLETE/PUBLIC Aquifex aeolicus Pyrococcus horikoshii Bacillus subtilis Treponema pallidum Borrelia burgdorferi Helicobacter pylori Archaeoglobus fulgidus Methanobacterium thermo. Escherichia coli Mycoplasma pneumoniae Synechocystis sp. PCC6803 Methanococcus jannaschii Saccharomyces cerevisiae Mycoplasma genitalium Haemophilus influenzae COMPLETE/PENDING PUBLICATION Rickettsia prowazekii Pseudomonas aeruginosa Pyrococcus abyssii Bacillus sp. C-125 Ureaplasma urealyticum Pyrobaculum aerophilum ALMOST/PUBLIC Pyrococcus furiosus Mycobacterium tuberculosis H37Rv Mycobacterium tuberculosis CSU93 Neisseria gonorrhea Neisseria meningiditis Streptococcus pyogenes

  11. What is bioinformatics? • Sequence analysis • Geneticists/ molecular biologists analyse genome sequence information to understand disease processes • Molecular modeling • Crystallographers/ biochemists design drugs using computer-aided tools • Phylogeny/evolution • Geneticists obtain information about the evolution of organisms by looking for similarities in gene sequences • Ecology and population studies • Bioinformatics is used to handle large amounts of data obtained in population studies • Medical informatics • Personalised medicine

  12. Promises of genomics and bioinformatics • Medicine • Knowledge of protein structure facilitates drug design • Understanding of genomic variation allows the tailoring of medical treatment to the individual’s genetic make-up • Genome analysis allows the targeting of genetic diseases • The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated • The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

  13. Sequence analysis: overview Sequence entry Sequencing project management Sequence database browsing Manual sequence entry Nucleotide sequence analysis Nucleotide sequence file Search for protein coding regions Search databases for similar sequences Protein sequence analysis • Design further experiments • Restriction mapping • PCR planning Translate into protein Protein sequence file coding non-coding Search databases for similar sequences Search for known motifs Predict secondary structure Sequence comparison Search for known motifs RNA structure prediction Sequence comparison Predict tertiary structure Multiple sequence analysis Create a multiple sequence alignment Edit the alignment Format the alignment for publication Molecular phylogeny Protein family analysis

  14. 1 atgcgttata ttcgcctgtg tattatctcc ctgttagcca ccctgccgct ggcggtacac 61 gccagcccgc agccgcttga gcaaattaaa ctaagcgaaa gccagctgtc gggccgcgta 121 ggcatgatag aaatggatct ggccagcggc cgcacgctga ccgcctggcg cgccgatgaa 181 cgctttccca tgatgagcac ctttaaagta gtgctctgcg gcgcagtgct ggcgcgggtg 241 gatgccggtg acgaacagct ggagcgaaag atccactatc gccagcagga tctggtggac 301 tactcgccgg tcagcgaaaa acatcttgcc gacggcatga cggtcggcga actctgcgcc 361 gccgccatta ccatgagcga taacagcgcc gccaatctgc tgctggccac cgtcggcggc 421 cccgcaggat tgactgcctt tttgcgccag atcggcgaca acgtcacccg ccttgaccgc 481 tgggaaacgg aactgaatga ggcgcttccc ggcgacgccc gcgacaccac taccccggcc 541 agcatggccg cgaccctgcg caagctgctg accagccagc gtctgagcgc ccgttcgcaa 601 cggcagctgc tgcagtggat ggtggacgat cgggtcgccg gaccgttgat ccgctccgtg 661 ctgccggcgg gctggtttat cgccgataag accggagctg gcgaacgggg tgcgcgcggg 721 attgtcgccc tgcttggccc gaataacaaa gcagagcgca ttgtggtgat ttatctgcgg 781 gatacgccgg cgagcatggc cgagcgaaat cagcaaatcg ccgggatcgg cgcggcgctg 841 atcgagcact ggcaacgcta a

  15. Gene Sequencing: Automated chemcial sequencing methods allow rapid generation of large data banks of gene sequences

  16. Database similarity searching: The BLAST program has been written to allow rapid comparison of a new gene sequence with the 100s of 1000s of gene sequences in data bases Sequences producing significant alignments: (bits) Value gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae] 112 7e-26 gi|603258 (U18795) Prb1p: vacuolar protease B [Saccharomyces ce... 106 5e-24 gnl|PID|e264388 (X59720) YCR045c, len:491 [Saccharomyces cerevi... 69 7e-13 gnl|PID|e239708 (Z71514) ORF YNL238w [Saccharomyces cerevisiae] 30 0.66 gnl|PID|e239572 (Z71603) ORF YNL327w [Saccharomyces cerevisiae] 29 1.1 gnl|PID|e239737 (Z71554) ORF YNL278w [Saccharomyces cerevisiae] 29 1.5 gnl|PID|e252316 (Z74911) ORF YOR003w [Saccharomyces cerevisiae] Length = 478 Score = 112 bits (278), Expect = 7e-26 Identities = 85/259 (32%), Positives = 117/259 (44%), Gaps = 32/259 (12%) Query: 2 QSVPWGISRVQAPAAHNRG---------LTGSGVKVAVLDTGIST-HPDLNIRGG-ASFV 50 + PWG+ RV G G GV VLDTGI T H D R + + Sbjct: 174 EEAPWGLHRVSHREKPKYGQDLEYLYEDAAGKGVTSYVLDTGIDTEHEDFEGRAEWGAVI 233 Query: 51 PGEPSTQDGNGHGTHVAGTIAALNNSIGVLGVAPSAELYXXXXXXXXXXXXXXXXXQGLE 110 P D NGHGTH AG I + + GVA + ++ +G+E Sbjct: 234 PANDEASDLNGHGTHCAGIIGSKH-----FGVAKNTKIVAVKVLRSNGEGTVSDVIKGIE 288

  17. Sequence comparison: Gene sequences can be aligned to see similarities between gene from different sources 768 TT....TGTGTGCATTTAAGGGTGATAGTGTATTTGCTCTTTAAGAGCTG 813 || || || | | ||| | |||| ||||| ||| ||| 87 TTGACAGGTACCCAACTGTGTGTGCTGATGTA.TTGCTGGCCAAGGACTG 135 . . . . . 814 AGTGTTTGAGCCTCTGTTTGTGTGTAATTGAGTGTGCATGTGTGGGAGTG 863 | | | | |||||| | |||| | || | | 136 AAGGATC.............TCAGTAATTAATCATGCACCTATGTGGCGG 172 . . . . . 864 AAATTGTGGAATGTGTATGCTCATAGCACTGAGTGAAAATAAAAGATTGT 913 ||| | ||| || || ||| | ||||||||| || |||||| | 173 AAA.TATGGGATATGCATGTCGA...CACTGAGTG..AAGGCAAGATTAT 216

  18. 50 100 150 200 250 AceIII 1 CAGCTCnnnnnnn’nnn... AluI 2 AG’CT AlwI 1 GGATCnnnn’n_ ApoI 2 r’AATT_y BanII 1 G_rGCy’C BfaI 2 C’TA_G BfiI 1 ACTGGG BsaXI 1 ACnnnnnCTCC BsgI 1 GTGCAGnnnnnnnnnnn... BsiHKAI 1 G_wGCw’C Bsp1286I 1 G_dGCh’C BsrI 2 ACTG_Gn’ BsrFI 1 r’CCGG_y CjeI 2 CCAnnnnnnGTnnnnnn... CviJI 4 rG’Cy CviRI 1 TG’CA DdeI 2 C’TnA_G DpnI 2 GA’TC EcoRI 1 G’AATT_C HinfI 2 G’AnT_C MaeIII 1 ’GTnAC_ MnlI 1 CCTCnnnnnn_n’ MseI 2 T’TA_A MspI 1 C’CG_G NdeI 1 CA’TA_TG Sau3AI 2 ’GATC_ SstI 1 G_AGCT’C TfiI 2 G’AwT_C Tsp45I 1 ’GTsAC_ Tsp509I 3 ’AATT_ TspRI 1 CAGTGnn’ Restriction mapping: Genes can be analysed to detect gene sequences that can be cleaved with restriction enzymes

  19. PCR Primer Design: Oligonucleotides for use in the polymerisation chain reaction can be designed using computer based prgrams OPTIMAL primer length --> 20 MINIMUM primer length --> 18 MAXIMUM primer length --> 22 OPTIMAL primer melting temperature --> 60.000 MINIMUM acceptable melting temp --> 57.000 MAXIMUM acceptable melting temp --> 63.000 MINIMUM acceptable primer GC% --> 20.000 MAXIMUM acceptable primer GC% --> 80.000 Salt concentration (mM) --> 50.000 DNA concentration (nM) --> 50.000 MAX no. unknown bases (Ns) allowed --> 0 MAX acceptable self-complementarity --> 12 MAXIMUM 3' end self-complementarity --> 8 GC clamp how many 3' bases --> 0

  20. Gene discovery:Computer program can be used to recognise the protein coding regions in DNA

  21. RNA structure prediction: Structural features of RNA can be predicted A C G U G C A A U G C U A U A C G G A A U U A U G U A C U C G C C A G G G U G G G G G U C C G C U C A C U C G U C A A A U G C G C U A G U C G G C C A

  22. Multiple sequence alignment: Sequences of proteins from different organisms can be aligned to see similarities and differences Alignment formatted using MacBoxshade

  23. Phylogeny inference: Analysis of sequences allows evolutionary relationships to be determined E.coli C.botulinum C.cadavers C.butyricum B.subtilis B.cereus Phylogenetic tree constructed using the Phylip package

  24. Mapping Identifying the location of clones and markers on the chromosome by genetic linkage analysis and physical mapping Sequencing Assembling clone sequence reads into large (eventually complete) genome sequences Gene discovery Identifying coding regions in genomic DNA by database searching and other methods Function assignment Using database searches, pattern searches, protein family analysis and structure prediction to assign a function to each predicted gene Data mining Searching for relationships and correlations in the information Genome comparison Comparing different complete genomes to infer evolutionary history and genome rearrangements Large scale bioinformatics: genome projects

  25. The job of the biologist is changing As more biological information becomes available and laboratory equipment becomes more automated ... • The biologist will spend more time using computers & on experimental design and data analysis (and less time doing tedious lab biochemistry) • Biology will become a more quantitative science (think how the periodic table affected chemistry)

  26. Finding genes in genome sequence is not easy • About 1% of human DNA encodes functional genes. • Genes are interspersed among long stretches of non-coding DNA. • Repeats, pseudo-genes, and introns confound matters

  27. DNA chip microarrays • Put a large number (~100K) of cDNA sequences or synthetic DNA oligomers onto a glass slide (or other substrate) in known locations on a grid. • Label an RNA sample and hybridize • Measure amounts of RNA bound to each square in the grid • Make comparisons • Cancerous vs. normal tissue • Treated vs. untreated • Time course • Many applications in both basic and clinical research

  28. Spot your own Chip Robot spotter Ordinary glass microscope slide

  29. cDNA spotted microarrays

  30. Goal of Microarray experiments • Microarrays are a very good way of identifying a bunch of genes involved in a disease process • Differences between cancer and normal tissue • Tuberculosis infected vs resistant lung cells • Mapping out a pathway • Co-regulated genes • Finding function for unknown genes • Involved these processes

  31. Direct Medical Applications • Diagnosis • Type of cancer • Aggressive or benign? • Monitor treatment outcome • Is a treatment having the desired effect on the target tissue?

  32. Human Genetic Variation • Every human has essentially the same set of genes • But there are different forms of each gene -- known as alleles genetic diseases such as cystic fibrosis or Huntington’s disease are caused by dysfunctional alleles

  33. On chromosome four the sequence CAG is repeated six or more times. • The other strand of DNA in this gene has repeats of GTC. • This sequence happens to lie within a gene, called huntingtin. • longer the repeat of CAG, the greater is the probability mispairing happening. • individual born with extra CAGs in the huntingtin gene is likely to develop Huntington's chorea. The more extra CAGs there are in the gene, the earlier in life the disease will show up

  34. Clinical Manifestationsof Genetic Variation (All disease has a genetic component) • Susceptibility vs. resistance • Variations in disease severity or symptoms • Reaction to drugs (pharmacogenetics) All of these traits can be traced back to particular genes (or sets of genes)

  35. Pharmacogenomics • People react differently to drugs • Side effects • Variable effectiveness • There are genes that control these reactions • SNP markers can be used to identify these genes (profiles)

  36. Use the Profiles • Genetic profiles of new patients can then be used to prescribe drugs more effectively & avoid adverse reactions. • Sell a drug with a gene test • Can also speed clinical trials by testing on those who are likely to respond well.

  37. Toxicogenomics • There are a number of common pathways for drug toxicity (or environmental tox.) • It is possible to compile genomic signatures (gene expression data) for these pathways. • Candidate drug molecules can be screened in cell culture or in animals for induction of these toxicity pathways.

  38. Planning for a Genomics Revolution • Bioinformatics support must be integral in the planning process for the development of new genomics research facilities. • Genome Project sequencing centers have more staff and more $$$ spent on data analysis than on the sequencing itself. • Microarray facilities will be even more skewed toward data analysis • It is an information-intensive business!

  39. Implications for Biomedicine • Physicians will use genetic information to diagnose and treat disease. • Virtually all medical conditions have a genetic component. • Faster drug development research • Individualized drugs • Gene therapy • All Biologists will use gene sequence information in their daily work

  40. Training "computer savvy" scientists • Know the right tool for the job • Get the job done with tools available • Network connection is the lifeline of the scientist • Jobs change, computers change, projects change, scientists need to be adaptable

  41. Long Term Implications • A "periodic table for biology" will lead to an explosion of research and discoveries - we will finally have the tools to start making systematic analyses of biological processes (quantitative biology). • Understanding the genome will lead to the ability to change it - to modify the characteristics of organisms and people in a wide variety of ways

  42. Genomics Education • Genomics scientists need basic training in both Molecular Biology and Computing • Specific training in the use of automated laboratory equipment, the analysis of large datasets, and bioinformatics algorithms • Particularly important for the training of medical doctors - at least a familiarity with the technology

  43. Genomics in Medical Education “The explosion of information about the new genetics will create a huge problem in health education. Most physicians in practice have had not a single hour of education in genetics and are going to be severely challenged to pick up this new technology and run with it." Francis Collins

More Related