1 / 32

Computational Problems in Molecular Biology

Computational Problems in Molecular Biology. Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064 http://digbio.missouri.edu. Lecture Outline. From DNA to gene Protein sequence and structure Gene expression

rocio
Télécharger la présentation

Computational Problems in Molecular Biology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computational Problemsin Molecular Biology Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064 http://digbio.missouri.edu

  2. Lecture Outline • From DNA to gene • Protein sequence and structure • Gene expression • Protein interaction and pathway • Provide a roadmap for the entire course • Biology from system level (computational perspective)

  3. About Life • Life is wonderful: amazing mechanisms • Life is not perfect: errors and diseases • Life is a result of evolution

  4. Cells • Basic unit of life • Prokaryotes/eukaryotes • Different types of cell: • Skin, brain, red/white blood • Different biological function • Cells produced by cells • Cell division (mitosis) • 2 daughter cells

  5. DNA • Double Helix (Watson & Crick) • Nitrogenous Base Pairs • Adenine  Thymine [A,T] • Cytosine  Guanine [C,G] • Weak bonds (can be broken) • Form long chains

  6. Genome • Each cell contains a full genome (DNA) • The size varies: • Small for viruses and prokaryotes (10 kbp-20Mbp) • Medium for lower eukaryotes • Yeast, unicellular eukaryote 13 Mbp • Worm (Caenorhabditis elegans) 100 Mbp • Fly, invertebrate (Drosophila melanogaster) 170 Mbp • Larger for higher eukaryotes • Mouse and man 3000 Mbp • Very variable for plants (many are polyploid) • Mouse ear cress (Arabidopsis thaliana) 120 Mbp • Lilies 60,000 Mbp

  7. Differences in DNA ~2% ~4% ~0.2%

  8. Genes • Chunks of DNA sequence that can translate into functional biomolecules (protein, RNA) • 2% human DNA sequence for coding genes • 32,000 human genes, 100,000 genes in tulips

  9. Gene Structure • General structure of an eukaryotic gene • Unlike eukaryotic genes, a prokaryotic gene typically consists of only one contiguous coding region

  10. Informational Classes in Genomic DNA • Transcribed sequences (exons and introns) • Messenger sequences (mRNA, exons only) • Coding sequences (CDS, part of the exons only) • Heads and tails: untranslated parts (UTR) • Regulatory sequences • ... and all the rest  Identify them: gene-finding

  11. Genetic Code A=Ala=Alanine C=Cys=Cysteine D=Asp=Aspartic acid E=Glu=Glutamic acid F=Phe=Phenylalanine G=Gly=Glycine H=His=Histidine I=Ile=Isoleucine K=Lys=Lysine L=Leu=Leucine M=Met=Methionine N=Asn=Asparagine P=Pro=Proline Q=Gln=Glutamine R=Arg=Arginine S=Ser=Serine T=Thr=Threonine V=Val=Valine W=Trp=Tryptophan Y=Tyr=Tyrosine

  12. Protein Synthesis • AGCCACTTAGACAAACTA (DNA) • Transcribed to: • AGCCACUUAGACAAACUA (mRNA) • Translated to: • SHLDKL (Protein)

  13. About Protein 10s – 1000s amino acids (average 300) Lysozyme sequence (129 amino acids): KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL Protein backbones: Side chain

  14. Evolution of Genes: Mutation • Genes alter (slightly) during reproduction • Caused by errors, from radiation, from toxicity • 3 possibilities: deletion, insertion, alteration • Deletion: ACGTTGACTC  ACGTGACTC • Insertion: ACGTTGACTC  AGCGTTGACTC • Substitution: ACGTTGACTC  ACGATGACTC • Mutations are mostly deleterious

  15. Evolution and Homology Ancestor Orthologs (similar function) Gene duplication Paralogs (related functions) Y X Recombination Twilight zone: undetectable homology (<20% sequence identity) Mixed Homology 75%X 25%Y

  16. Sequence Comparison • Pairwise sequence comparison • multiple alignment SAANLEYLKNVLLQFIFLKPG--SERERLLPVINTMLQLSPEEKGKLAAV O15045 NEKNMEYLKNVFVQFLKPESVP-AERDQLVIVLQRVLHLSPKEVEILKAA P34562 KNEKIAYIKNVLLGFLEHKE----QRNQLLPVISMLLQLDSTDEKRLVMS Q06704 REINFEYLKHVVLKFMSCRES---EAFHLIKAVSVLLNFSQEEENMLKET Q92805 MLIDKEYTRNILFQFLEQRD----RRPEIVNLLSILLDLSEEQKQKLLSV O42657 EPTEFEYLRKVMFEYMMGR-----ETKTMAKVITTVLKFPDDQAQKILER O70365 DPAEAEYLRNVLYRYMTNRESLGKESVTLARVIGTVARFDESQMKNVISS Q21071 STSEIDYLRNIFTQFLHSMGSPNAASKAILKAMGSVLKVPMAEMKIIDKK Q18013

  17. Phylogenetic Trees Understand evolution

  18. Protein Structure Lysozyme structure: ball & stick strand surface

  19. Structure Features of Folded Proteins • Compact • Secondary structures: loop a-helix b-sheet Protein cores mostly consist of a-helices and b-sheets

  20. Protein Structure Comparison Structure is better conserved than sequence Structure can adopt a wide range of mutations. Physical forces favor certain structures. Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel

  21. Protein Folding Problem A protein folds into a unique 3D structure under the physiological condition Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV QAWIRGCRL

  22. Structure-Function Relationship Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism. A predicted structure is a powerful tool for function inference. Trp repressor as a function switch

  23. Structure-Based Drug Design Structure-based rational drug design is still a major method for drug discovery. HIV protease inhibitor

  24. Gene Expression Same DNA in all cells, but only a few percent common genes expressed (house-keeping genes). A few examples: (1) Specialized cell: over-represented hemoglobin in blood cells. (2) Different stages of life cycle: hemoglobins before and after birth, caterpillar and butterfly. (3) Different environments: microbial in nutrient poor or rich environment. (4) Special treatment: response to wound.

  25. Eucaryote Gene Expression Control nucleus cytosol inactive mRNA mRNA degradation control Primary RNA transcript DNA mRNA mRNA RNA transport control translation control transcriptional control RNA processing control protein protein activity control Methods: Mass-spec Microarray nucleus membrane inactive protein

  26. Gene Regulation promoter DNA sequence Start of transcription operator

  27. Microarray Experiments • Regulation/function/pathway/cellular state/phenotype • Disease: diagnosis/gene identification/sub-typing Microarray chip Microarray data

  28. Genetic vs. Physical Interaction Gene/protein interaction Complex system Physical interaction Regulatory network Genetic interaction Transcription factor Expressed gene

  29. Biological Pathway

  30. structure Studying Pathways throughSystems Biology Approach RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC sequence gene regulation pathway (cross-talk) function protein interaction

  31. Discussion • Possible impacts of biotechnology to our life

  32. Assignments • Required reading: * Chapter 13 in “Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000.” * Larry Hunter: molecular biology for computer scientists • Optional reading: http://www.ncbi.nih.gov/About/primer/bioinformatics.html http://www.bentham.org/cpps1-1/Dong%20Xu/xu_cpps.htm

More Related