1 / 54

CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014. Tamer Kahveci CISE Department University of Florida. Vital Information. Instructor: Tamer Kahveci Office: E566 Time: Mon/Wed/Fri 9:35 - 10:25 AM Office hours: Mon/Thu 2:00-2:50 PM Course page:

freira
Télécharger la présentation

CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 4930/6930 – Recent Advances in BioinformaticsSpring 2014 Tamer Kahveci CISE Department University of Florida

  2. Vital Information • Instructor: Tamer Kahveci • Office: E566 • Time: Mon/Wed/Fri 9:35 - 10:25 AM • Office hours: Mon/Thu 2:00-2:50 PM • Course page: • http://www.cise.ufl.edu/~tamer/teaching/spring2014

  3. Goals • This course will discuss the cutting edge developments in bioinformatics and computational biology. We will discuss in depth the recent publications on computational biology and bioinformatics with emphasis on computer science challenges and contributions particularly on biological networks.

  4. Bioinformatics & Systems Biology • Bioinformatics is the science where computational and information science is used to understand biological data. • Systems biology studies the interactions between the components of biological systems, and how these interactions give rise to the function and behavior of that system.

  5. This Course will • Give you exposure to research topics in bioinformatics. • Strongly encourage you to explore research problems and make contribution.

  6. This Course will not • Teach you biology or fundamentals of bioinformatics. • Teach you programming • Teach you how to be an expert user of off-the-shelf molecular biology computer packages.

  7. Course Outline • Introduction to terminology • Biological networks • Comparison of biological networks • Network motifs • Essentiality in networks • Network reconstruction

  8. How can I get an A ? Grading Paper presentations Project HW & Quizzes • Bonus • 2.5% attendance • 2.5% project contribution 90+ = A- & above 80+ = B & above 70+ = C & above

  9. Expectations • Require • Data structures and algorithms. • Coding (C, Java) • Encourage • actively participate in discussions in the classroom • read bioinformatics literature in general • attend colloquiums on campus • Academic honesty

  10. Text Book • Not required, but recommended. • Class notes + papers.

  11. Where to Look ? • Journals • Bioinformatics • Genome Research • PLOS Computational Biology • Journal of Computational Biology • IEEE Transaction on Computational Biology and Bioinformatics • Conferences • RECOMB • ISMB • ECCB • PSB • BCB

  12. A Gentle Introduction to Molecular Biology

  13. Goals • Understand major components of biological data • DNA, protein sequences, expression arrays, protein structures • Get familiar with basic terminology • Learn commonly used data formats

  14. Genetic Material: DNA • Deoxyribonucleic Acid, 1950s • Basis of inheritance • Eye color, hair color, … • 4 nucleotides • A, C, G, T

  15. Chemical Structure of Nucleotides Pyrmidines Purines

  16. Making of Long Chains 5’ -> 3’

  17. DNA structure • Double stranded, helix (Watson & Crick) • Complementary • A-T • G-C • Antiparallel • 3’ -> 5’ (downstream) • 5’ -> 3’ (upstream) • Animation (ch3.1)

  18. Base Pairs

  19. Question • 5’ - GTTACA – 3’ • 5’ – XXXXXX – 3’ ? • 5’ – TGTAAC – 3’ • Reverse complements.

  20. Repetitive DNA • Tandem repeats: highly repetitive • Satellites (100 k – 1 Gbp) / (a few hundred bp) • Mini satellites (1 k – 20 kbp) / (9 – 80 bp) • Micro satellites (< 150 bp) / (1 – 6 bp) • DNA fingerprinting • Interspersed repeats: moderately repetitive • LINE • SINE • Proteins contain repetitive patterns too

  21. Genetic Material: an Analogy • Nucleotide => letter • Gene => sentence • Contig => chapter • Chromosome => book • Traits: Gender, hair/eye color, … • Disorders: down syndrome, turner syndrome, … • Chromosome number varies for species • We have 46 (23 + 23) chromosomes • Complete genome => volumes of encyclopedia • Hershey & Chase experiment show that DNA is the genetic material. (ch14)

  22. Functions of Genes 1/2 • Signal transduction: sensing a physical signal and turning into a chemical signal • Enzymatic catalysis: accelerating chemical transformations otherwise too slow. • Transport: getting things into and out of separated compartments • Animation (ch 5.2)

  23. Functions of Genes 2/2 • Movement: contracting in order to pull things together or push things apart. • Transcription control: deciding when other genes should be turned ON/OFF • Animation (ch7) • Structural support: creating the shape and pliability of a cell or set of cells

  24. Central Dogma

  25. Introns and Exons 1/2

  26. Introns and Exons 2/2 • Humans have about 25,000 genes = 40,000,000 DNA bases < 3% of total DNA in genome. • Remaining 2,960,000,000 bases for control information. (e.g. when, where, how long, etc...)

  27. Protein DNA (Genotype) Phenotype Gene expression

  28. Gene Expression • Building proteins from DNA • Promoter sequence: start of a gene •  13 nucleotides. • Positive regulation: proteins that bind to DNA near promoter sequences increases transcription. • Negative regulation

  29. Microarray Animation on creating microarrays

  30. Amino Acids • 20 different amino acids • ACDEFGHIKLMNPQRSTVWY but not BJOUXZ • ~300 amino acids in an average protein, hundreds of thousands known protein sequences • How many nucleotides can encode one amino acid ? • 42 < 20 < 43 • E.g., Q (glutamine) = CAG • degeneracy • Triplet code (codon)

  31. Triplet Code

  32. Side Chain Molecular Structure of Amino Acid C • Non-polar, Hydrophobic (G, A, V, L, I, M, F, W, P) • Polar, Hydrophilic (S, T, C, Y, N, Q) • Electrically charged (D, E, K, R, H)

  33. Peptide Bonds

  34. Direction of Protein Sequence Animation on protein synthesis (ch15)

  35. Data Format • GenBank • EMBL (European Mol. Biol. Lab.) • SwissProt • FASTA • NBRF (Nat. Biomedical Res. Foundation) • Others • IG, GCG, Codata, ASN, GDE, Plain ASCII

  36. Primary Structure of Proteins >2IC8:A|PDBID|CHAIN|SEQUENCE ERAGPVTWVMMIACVVVFIAMQILGDQEVMLWLAWPFDPTLKFEFWRYFTHALMHFSLMHILFNLLWWWYLGGAVEKRLGSGKLIVITLISALLSGYVQQKFSGPWFGGLSGVVYALMGYVWLRGERDPQSGIYLQRGLIIFALIWIVAGWFDLFGMSMANGAHIAGLAVGLAMAFVDSLNA

  37. Secondary Structure: Alpha Helix • 1.5 A translation • 100 degree rotation • Phi = -60 • Psi = -60

  38. Secondary Structure: Beta sheet anti-parallel parallel Phi = -135 Psi = 135

  39. Tertiary Structure phi2 phi1 2N angles psi1

  40. Tertiary Structure • 3-d structure of a polypeptide sequence • interactions between non-local atoms tertiary structure of myoglobin

  41. Ramachandran Plot Sample pdb entry ( http://www.rcsb.org/pdb/ )

  42. Quaternary Structure • Arrangement of protein subunits quaternary structure of Cro human hemoglobin tetramer

  43. Structure Summary • 3-d structure determined by protein sequence • Prediction remains a challenge • Diseases caused by misfolded proteins • Mad cow disease • Classification of protein structure

  44. Systems biology • A biological system is made up of components (e.g., proteins, genes, compounds) that interact with each other to affect one another. As a result they serve a set of functions of that system. • Internal factors can alter the networks. • E.g., gene expression and regulation. • External factors can alter the network. • E.g., drugs, radiation, food, temperature, bacteria and virus. • We develop quantitative mathematical models that can explain the how the interactions take place. • E.g., Boolean, stochastic, ordinary differential equations, probabilistic, etc. • We develop algorithmic methods to analyze the networks under these models.

  45. Signal Transduction Networks • Vertices are proteins. • A directed edge from vertex X to vertex Y if X changes the activity level of Y under certain conditions

  46. Transcription regulation networks • Two types of vertices: proteins (transcription factors, or TF’s) and genes • Edges are directed from TF’s to genes. • An edge from TF X to gene Y if X transcribes Y

  47. Post-transcription regulation • Two types of vertices • RNA binding proteins • RNA • Directed edge from proteins to RNA RNA binding protein

  48. Metabolic networks 1/2 • Various representations • Vertices are compounds and directed edges are biochemical reactions • Two types of vertices, one for compounds one for reactions. Directed edges from one type to the other.

  49. Metabolic networks 2/2 • Reactions • Catabolism: breaking down large molecules, for example to harvest energy in cellular respiration • Anabolism: using energy to construct components of cells, such as proteins and nucleic acids

  50. Protein-protein interaction (PPI) network • Vertices are proteins. • An edge between two vertices if the two proteins interact (i.e., form a protein complex). • Undirected edges.

More Related