1 / 35

Outline

Outline. General course owerview. Aims of the course . How is the course taught? What are the formative and summative assessments for the course? What are the essential requirements to pass the course? Which textbook does the course use? What is biology and bioinformatics ?

Télécharger la présentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • General course owerview. • Aims of the course. How is the course taught?What are the formative and summative assessments for the course?What are the essential requirements to pass the course? Which textbook does the course use? • What is biology and bioinformatics? • Types and characteristics of biological data. • Types and numbers of databases and/or bioinformatics tools. • Biological preliminaries. • Propertise and organization of life. • Structures and fundamental roles of the DNA macromolecules. • Flow of genetic information. • The genetic code. • How do genetic variants arise? • What is the genome and terms of the three genomic paradoxes? Slide 1/1

  2. Course overview, content and methodology Welcome to the lecture/practical curse“Informatics-Bioinformatics” • This is an application-oriented coursedivided into two alternating modules: A) informatics and B) bioinformatics, presented by Dr. Gabor Pauler and Dr. Csaba Fekete respectively. • In these modules you will learn how to use standard web-based bioinformatics tools and databases. The main emphasis is placed on making it as easy as possible for the user. • The course was made with the assumption that you have rudimentary level of knowledge in biochemestry, cell and molecular biology, so in the current context we can only give an extremely brief summary (reminder) of these topics . Slide 1/2

  3. Course overview, content and methodology • If you wish to learn more about molecular biology or related disciplines, we suggest you read some of the standard textbooks mentioned in our bibliography. • The weekly laboratory courseworkwill closelycoupled to the lecture topics. If you are confused about anything, don’t hesitate to ask your instructor. • To introducepractical use of the tools, in-class demonstrations willperform to give you a general overview of the experiment workflow; however,detailedcookbook-style instructions not will be provided. • In order to successfully complete the courseyou still need be present in at least 85% of the classes. Slide 1/3

  4. Course overview, content and methodology • There will be about 6 exercises (homework) throughout the course, that will combine theoretical questions and hands-on activities. • Lab cycles ran Monday to Saturday (deadline of homework). All exercises are mandatory; you should submit exercises through the course site (https://elearning.ttk.pte.hu/moodle/). • Assessments/Examinations: class participation and homework assignment are 30% of the final grade. Mid-term exam and final test 35-35% of the grade. • If you have any issues, contact me: Dr. Csaba Fekete associate professor (senior lecturer); University of Pecs, Department of General and Environmental Microbiology, 7624 Pecs, Ifjusag str. 6. Office: E330; Tel.: ++ (36)72 503-600; Extension: 4815 or 4810; e-mail: fekete@gamma.ttk.pte.hu Slide 1/4

  5. Let’s try to avoid the scholastic equivalent of this! Slide 1/5

  6. What is biology? • 1 Natural sciences • 1.1 Physical sciences • 1.1.1 Chemistry • 1.1.2 Physics • 1.1.3 Astronomy • 1.1.4 Earth science • 1.1.5 Environmental science • 1.2 Life sciences (Biology) • AnatomyBiophysicsCell biology Genomics • AstrobiologyConservation biologyMicrobiology • BioinformaticsDevelopmental biology Mycology • Molecular biology BiotechnologyPhysiology • EcologyEpidemiologyMorphology • ProteomicsEvolution Systematic • BotanyGenetics Virology • 2 Cognitive sciences • 3 Formal sciences • 3.1 Computer sciences • 3.2 Mathematics • 3.3 Statistics • 3.4 Systems science • 4 Social sciences • 5 Applied sciences Biology is a natural science concerned with the study of life and living organisms, including their structure, function, growth, origin, evolution, distribution, and taxonomy. Computer scienceis the study of the theoretical foundations of information and computation, and of practical techniques for their implementation and application in computer systems. Slide 2/6

  7. What is Bioinformatics? A marriage between Biology and Computers! Computer+ Mouse = Bioinformatics (Information)(Biology) • Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. • Bioinformatics is the science of managing and analyzingbiological data using advanced computing techniques. • Bioinformatics ultimate goal, (as is described by an expert), is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned. • Bioinformatics is the computer-assisted data management discipline that helps us: acquire, store, organize, archive, analyze, integrate or visualize such data. Slide 2/7

  8. History of Bioinformatics • Bioinformatics as defined scientific discipline emerged in the mid-1990s when amount of sequence, structural, and biochemical data began to accumulate. • However, the roots of bioinformatics can be traced back to the 1960s, when Margaret Dayhoff established the first database of protein sequences. • Bioinformatics was born when the first complete protein sequence (bovine insulin) was determined by Frederick Sanger. • The first DNA sequences were obtained in the early 1970s. • In 1976, Walter Fiers and his teamestablished the first complete genome of MS2 bacteriophage. IBM 7090 Margaret Dayhoff (1925-1983) 100 proteins Frederick Sanger (1918-) Walter Fiers Slide 2/8

  9. Characteristics of biological data Biology and Life Sciences have become increasingly “data rich” over the past decade. • Biological data has three important characteristics: (i) complexity, (ii) heterogeneity, and (iii) highly dynamic data and schema. • Biological data is complex in the sense that it is very rich in metadata, (Metadata is defined as data providing information about one or more other pieces of data.)and it has hierarchical structures. • Biological data is heterogeneousin the sense that it involves a wide array of data types, including text, image, sequence data, as well as streaming data (A data stream is a sequence of digitally encoded coherent signals used to transmit or receive information e.g., medical sensors data),temporal data, and incomplete and missing data. • Biological data is highly dynamic, not only in content, but also in schema (i.e., structure). Slide 3/9

  10. What units of information do we deal with in bioinformatics? • Biological data can be very diverse and can touch many life science domains. Each domains (subdisciplines) has its own terminology, nomenclature, rules and data needs. For instance, biological data may consist of the following: • DNA, RNA, protein sequences:Determined order of nucleotides or amino acids. • Graphs: Indicating relationships can be captured as graphs, as in the cases of metabolic pathways, signaling pathways, gene regulatory networks, genetic maps, and structured taxonomies. • High-dimensional data: Usedin system biology, for example, how expression profiles vary as a function of different experimental conditions. • Geometric information: Because biological function frequently depends on relative shape of molecules, three-dimensional configuration, molecular structure data are very important. • Scalar and vector fields:In biology, scalar and vector field properties are associated with chemical concentration, electric charge, hydrophobicity, fluxes across cell membranes, transport processes . Slide 3/10

  11. What units of information do we deal with in bioinformatics? • Patterns: Within the genome are patterns that characterize biologically interesting entities such as genes, regulatory sequences. Patterns are also interesting in the exploration of protein structure data, microarray data, pathway data, proteomics data, and metabolomics data. • Constraints: Consistency within a database is critical if the data are to be trustworthy, and biological databases are no exception. • Images: Imagery is an important part of biological research such as electron and optical microscopy,radiographic and fluorescence images etc. • Spatial information: Real biological entities, from cells to ecosystems, are not spatially homogeneous, and a great deal of interesting science can be found in understanding how one spatial region is different from another. • Models: Computational models must be compared and evaluated. • Prose:The biological literature itself can be regarded as data. Biological prose (text) is the basis for annotations, which can be regarded as a form of metadata. • Declarative knowledge: As the complexity of various biological systems is unraveled, machine-readable representationssuch as hypotheses and evidence. will be necessary . Slide 3/11

  12. Biological databases • The instruments of bioinformatics are computers, databases, and the statistical tools and algorithms that are used for data analysis. • Biological databases are archives of consistent data that are stored in a uniform and efficient manner. These databases contain data from a broad spectrum of molecular biology areas. Primarydatabases contain information and annotation of DNA and protein sequences, DNA and protein structures and DNA and protein expressionprofiles. • Secondary or derived databases are so called because they contain the results of analysis on the primary resources including information on sequence patterns or motifs, variants and mutations and evolutionary relationships. • Information from the literature is contained in bibliographic databases, such as Medlinedatabase of citations, abstracts and some full text articles on life sciences and biomedical topics. Slide 4/12

  13. Biological databases • There are many different databases and bioinformatics tools available over the Net free of charge. • The latest Molecular Biology Database Collectionincludes 1230 databases. • The full content of the Database Issue is available online at the Nucleic Acids Research web site. Slide 4/13

  14. Organization of the online database collection http://www.oxfordjournals.org/nar/database/a/ More then 1200 key databases of 14 categories http://www.oxfordjournals.org/nar/database/c/ http://www.oxfordjournals.org/nar/database/cap/ Slide 4/14

  15. Biosphere = Smallest unit of life Ecosytem Community Population Propertise and organization of life • Living organisms: • Are composed of cells or cell. • Are complex and ordered. • Respond to their environment. • Can grow and reproduce. • Obtain and use energy. • Maintain internal balance. • Allow for evolution adaptation. Slide 5/15

  16. Taxonomy • Humans appear to have an innate need to name things. In many primitive societies, a person who knows the true name of an object or of another person is believed to have power over that object or person. • Three separate but interrelated disciplines are involved in taxonomy • Identification • Characterizing organisms • Classification • Arranging into similar groups • Nomenclature • Naming organisms • Biologists often use a taxonomic keyto identify organismsaccording to their characteristics. Slide 6/16

  17. Building blocks from which all organisms are assembled Slide 7/17

  18. Two major kinds of cells • Prokaryotic and eukaryotic cells can be distinguished by their structural organization. Cells in different organisms or within the same organism vary significantly in shape, size, and behavior. However, they all share common characteristics that are essential for life. • In prokaryotic cells (Bacteria and Archea) the DNA is not separated from the cytoplasm in a nucleus. • There are no membrane-enclosed organelles in the cytoplasm. • Almost all prokaryotic cells have tough external cell walls. • Eukaryotic cells are subdivided by internal membranes into organelles. • DNA is found mainly in the nucleus. • Surrounding the nucleus is the cytoplasm which contains a viscous cytosol and various organelles. Slide 8/18

  19. The nucleus • Nucleus contains most of the genetic material (nucleic acids) in a eukaryotic cell. • Nucleus is separated from the cytoplasm by a double membrane. • Pores in the membrane allow large macromolecules and particles to pass into the cytoplasm. • In the nucleus, the DNA and associated proteins are organized into fibrous material, chromatin. • When the cell prepares to divide, the chromatin fibers coil up to be seen as separate structures, chromosomes. • Each eukaryotic species has a characteristic number of chromosomes. Slide 9/19

  20. Major differences between pro- and eukaryotic transcription and translation Slide 10/20

  21. Chromosome an organelle for packaging DNA • Chromosomes are composed of chromatin, a complex of DNA and protein; most are about 40% DNA and 60% protein. • The proteins of chromatin fall into two classes: histones and nonhistone chromosomal proteins. Five distinct histones are known: H1, H2A, H2B, H3, and H4 . • The DNA of a chromosome is one very long, double-stranded fiber that extends unbroken through the entire length of the chromosome. • The first level of compaction is where the DNA wraps around nucleosomes. • A higher order of chromatin structure is created when the nucleosomes are wound in the fashion of a solenoid having six nucleosomes per turn. • Coiling continues until the DNA is in a compact mass. Slide 11/21

  22. Variations in structure of chromosome Cri du chat syndrome • Chromosomes can be broken by X-rays and by certain chemicals.The broken ends spontaneously rejoin, but if there are multiple breaks, the ends join at random. This leads to alterations in chromosome structure. • Problems with structural changes: breaking the chromosome often means breaking a gene. Since most genes are necessary for life, many chromosome breaks are lethal or cause serious defects. • Also, chromosomes with structural variations often have trouble going through meiosis, giving embryos with missing or extra large regions of the chromosomes. This condition is aneuploidy, just like the chromosome number variations, and it is often lethal. • The major categories: duplication (an extra copy of a region of chromosome), deletion (missing a region of chromosome), inversion (part of the chromosome is inserted backwards, and translocation (two different chromosomes switch pieces). Down syndrome Edwards syndrome Slide 11/22

  23. Resuscitation:flow of genetic information • Nucleic acids (DNA, RNA) and proteins are biological macromolecules built as long linear chains of chemical components. • DNA plays a fundamental role in the processes of life in two respects. The central dogma of biology Replication • First it contains the templates (have the coding capacity) for the synthesis of proteinsandother products for all cellular functions. • The second role in which DNA is essential to life is as a medium to transmit information from generation to generation. Transcription Translation Slide 12/23

  24. The DNA structure • The deoxyribonucleic acid (DNA) molecule is double-stranded and composed of two strands in an antiparallel and complementary arrangement. • The basic unit, the nucleoside, consists of a molecule of deoxyribose sugar, a phosphate group, and one of four nitrogenous bases (nucleotides), each denoted by one of the letters A, C, G and T. • DNA from any cell of all organisms should have 1:1 ratio of pyrimidine and purine bases (Chargaff’s rule). Each type of base on one strand forms a bond with just one type of base on the other strand. This is called complementary base pairing A-T, G-C). 5’ 3’ Slide 13/24

  25. Structural comparison of A, B and Z DNA • There are three natural forms of DNA (A, B and Z). The origin of these different forms are related to the conformation of the sugar and the orientation of the base relative to the sugar. The C-form and D-form are unusual subclasses of B-type. Helix type A B Z Slide 13/25

  26. Major features of the genetic code • Genetic information (code) stored in DNA. • Based upon theoretical grounds Sidney Brenner (early 1960s) postulated the genetic code (41 = 4, 42 = 16, 43 = 64) used by cells is a triplet code, which consists of a three nucleotide sequence. 20 amino acids are encoded by 61 triplets • The sequence complementary to the code is the mRNA codon. • tRNA complimentary to codon is anticodon. Codons are nonoverlapping, degenerate, there is no internal punctuation in it. • The genetic code is universal (with some exceptions… • The first two nucleotides of a codon have a higher informational value than the third one. • The code can evolve. • An open reading frame (ORF) is the nucleotide sequence between a start- and a stop codon. Slide 14/26

  27. What is the genome? • The genome is all the DNA in a cell. • All the DNA on all the chromosomes. • Includes genes, intergenic sequences, repeats. • Specifically, it is all the DNA in an organelle. • Eukaryotes can have 2-3 genomes. • Nuclear genome • Mitochondrial genome • Plastid genome • If not specified, “genome” usually refers to the nuclear genome. • In eukaryotes, this term is commonly used to refer to one complete haploid set of chromosomes, such as that found in a sperm or egg. • The units of length of nucleic acids in which genome sizes are expressed : • Kilobase (Kb) 103 base pairs • Megabase (Mb) 106 base pairs Slide 15/27

  28. What is genes? • A gene is a unit of heredity in a living organism. • Gene is a segment of DNA that is involved in producing a polypeptide chain; it can include regions preceding and following the coding DNA as well as introns between the exons; it is considered a unit of heredity; "genes were formerly called factors„. • Complex genomes have almost 10x to 30x more DNA than is required to encode all the RNAs or proteins in the organism. • Contributors to the non-coding DNA include: • Introns in genes • Regulatory elements of genes • Multiple copies of genes, including pseudogenes • Intergenic sequences • Interspersed repeats Genesare the basic unit of heredity An intergenic region (IGR) is a stretch of DNA sequences located between clusters of genes that contain few or no genes. Slide 16/28

  29. Distinct components in complex genomes Slide 17/29

  30. Genome size • The genetic complement of a cell or virus constitutes its genome. • In eukaryotes, this term is commonly used to refer to one complete. haploid set of chromosomes, such as that found in a sperm or egg. • The C-value = the DNA content of the haploid genome. • The units of length of nucleic acids in which genome sizes are expressed : • Kilobase (Kb) 103 base pairs • Megabase (Mb) 106 base pairs Slide 17/30

  31. Genome Size • Viral genomes are typically in the range 100–1000 kb. • Bacteriophage MS2, one of the smallest viruses, has only four genes in a single stranded RNA molecule of about 4000 nucleotides (4kb). • Bacterial genomes are larger, typically in the range 1–10 Mb. • The chromosome of Escherichia coli is a circular DNA molecule of 4600 kb. • Eukaryotic genomes are typically in the range 100–1000 Mb. • Among eukaryotes, genome size often differs tremendously, even among closely related species. Slide 17/31

  32. 3.4  109 bp Homo sapiens 6.8  1011 bp Amoeba dubia 1.5  1010 bp Allium cepa The 3 genomic paradoxes • The C-value = the DNA content of the haploid genome. • C-value paradox: Complexity does not correlate with genome size. Slide 17/32

  33. The 3 genomic paradoxes • K-value paradox: Complexity does not correlate with chromosome number. Homo sapiens Lysandra atlantica Ophioglossum reticulatum 46 250 ~1260 Slide 17/33

  34. The 3 genomic paradoxes • N-value paradox: Complexity does not correlate with gene number. ~21,000 genes ~25,000 genes ~60,000 genes Slide 17/34

  35. Thank you! Introductory lectures: bhttp://www.youtube.com/watch?v=40Sum5KfG1Q Texbooks

More Related