1 / 58

DNA Learning Center July 15, 2003

DNA Learning Center July 15, 2003. W. Richard McCombie Professor Cold Spring Harbor Laboratory and The Watson School of Biological Sciences. Basic points. Genome research is advancing very rapidly Technologies are driving the progress

garth
Télécharger la présentation

DNA Learning Center July 15, 2003

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DNA Learning CenterJuly 15, 2003 W. Richard McCombie Professor Cold Spring Harbor Laboratory and The Watson School of Biological Sciences

  2. Basic points • Genome research is advancing very rapidly • Technologies are driving the progress • These technologies and the data that results from them will have a revolutionary effect on the way biological research is done and in our understanding of biology and medicine

  3. Major Topics • What is genomics and in particular the human genome program • Introduction and historical perspective on sequencing. • Some information about genomes being sequenced • Stategies to analyse genomes • Comparative genomics • How genomics has and will change biology and medicine

  4. What is an organism • At ONE LEVEL, it is the result of the execution of the code that is its genome • We do not know the degree to which environment alters this execution • We do know that in addition to physical attributes, many complex processes such as behavior have an influence from the code • We now know that in mammals, this code is only comprised of about 30,000-40,000 genes and their control units

  5. The Genome of an organism is: • The complete set of inherited instructions for that organism - It’s complete DNA code • When operating creates a set of proteins in an organized fashion • These proteins act to cause growth, development and reproduction of the organism

  6. What is genomics • Genomics is the analysis of the complete set of genetic instructions of an organism • These genetic instructions consist of genes, which direct the production of proteins and their control elements • These genes consist of a series of DNA bases • Previously we could only look at one or at most a few of these objects or parts at a time • Technology now enables us to see them all

  7. Why will genomics have such an impact • Important biological problems such as cancer and learning and memory are extraordinarily complex • Genomics lets us integrate this complex information in a meaningful way • Ultimately, much of biological research will be driven by computational analysis

  8. Sizes of some important genomes • Virus 0.003 - 0.300 million • Bacteria 0.8- 6 million • Yeast 15 million • C. elegans 100 million • Rice 435 million • Arabidopsis 130 million • Fugu 800 million • Mouse 2.5 billion • Corn 2.5 billion • Human 3 billion • Wheat 16-20 billion • Loblolly pine 20 billion

  9. Genome sequencing efficiencies per person • 1980: 0.1-1 kb per year • 1985: 1-5 kb per year • 1990: 25-50 kb per year • 1996: 100-200 kb per year • 2000: 500-1000 kb per year • 2002: 10,000 - 25,000 kb per year

  10. Methods to analyse a complex genome • Mapping • Genetic • Physical • Expressed gene analysis • Genome sequence analysis • Complete sequence • Skimming • “Rough draft”

  11. Salient features of genome organization • Higher organisms have large genomes with considerable amount of repeat sequences • Genes from higher organisms are interrupted by non-coding regions • Only a small portion of a genome codes for genes • Related organisms have related genomes

  12. Advantages Inexpensive “Know” sequence is coding Information about tissue or developmental stage expression Disadvantages Coverage is incomplete Position of sequence in the genome is unknown Only partial information about each gene No information about structural elements Expressed Sequence Tags (sequencing parts of the processed genes)

  13. Steps in genome sequencing • Construction of a large-insert library • Construction of a small insert subclone library • Isolation of DNA • Sequencing of the DNA fragments (8-10x) • Assembly of the data into contiguous regions • Filling the gaps in the sequence and resolving discrepancies • Confirmation of the sequence • Analysis

  14. Advantages Normalized coverage of all genes Information about gene structure Information about regulatory elements Genome organization Disadvantages Cost Time Difficult to determine if a sequence codes for a gene High Accuracy Genomic Sequencing (6-10x plus resolution of problems)

  15. “Rough draft” • Can be thought of as: • High coverage skimming • Low coverage complete sequencing • Advantages and disadvantages are intermediate between skimming and complete sequencing - dependent on the coverage

  16. Cost of various types of sequencing (per base) • “Base perfect” (uncomplicated) $0.3 • 8x shotgun - no finishing $0.1 • 4x shotgun - no finishing $0.05 • 3x shotgun - no finishing $0.04 • 1x shotgun - no finishing $0.01

  17. The Human Genome Project • Human genome consists of three billion base pairs – Adenine, Cytosine, Guanine, Thymine • Printing out the A,C,G,T would fill over 150,000 telephone book pages • Disease is often caused by a single variation in the three billion bases - one different letter in 150,000 pages

  18. The human genome project • A concerted effort to build resources to unravel the human control code • To develop map resources to link genetic elements (such as disease genes) to a physical representation of the genome • To determine the sequence of all of the DNA that combines to make the human control code

  19. 2-15-01

  20. Genome sequencing assignments II III IV V I CSHSC Kazusa TIGR SPP ESSA Genoscope Kazusa

  21. Gene Families

  22. NOR 3Mb 2Mb knob 0.5Mb 0.5Mb cen 2Mb Cytogenetic map of chromosome 4S Paul Fransz

  23. Complete genomic sequencing reduces the genetics of an organism to a closed, finite system

  24. FRUITFULL Gene Function The AGL8 gene was renamed FRUITFULL (ful1)

  25. Genetic Redundancy ap1 cal ful triple mutants have flowers replaced by shoots • apetala1 cauliflower double mutants have proliferating floral meristems ressembling cauliflowers

  26. The state of Arabidopsis research200?? • Complete annotated sequence available • Time to clone a gene has decreased from months to years to weeks in some cases • People are beginning to look at global features of Arabidopsis • Gene trap insertion in “every” gene • Insertion site sequences known, linked to physical and genetic map

  27. Analysis of not the first, or the second, but subsequent genomes • The information from the first few genomes will enable huge cost and time savings • A major emphasis will be to determine the function of genes

  28. What are the genes and what do they do??? • Computational analysis • Functional analysis • Microarrays • Transposons • Various other methods • Comparative analysis

  29. Comparative Genomics

  30. What can we learn from comparative analysis • Evolutionary relationships • Better annotation of genes, particularly of beginning and ends of genes • Detection of conserved regulatory regions • Functional evidence

  31. Benefits of having a model genome reference sequence with conserved local gene order to your plant of interest • Requirements for sequence accuracy decrease for most of the genome • you can fill in with high accuracy where needed • The reference genome can be used as a scaffold allowing the anchoring of clones (allowing partial sequence coverage to infer complete clone coverage)

  32. Co-linearity among cereal genomes

  33. What type of comparisons are useful? • Arabidopsis to very closely related species • Annotate the Arabidopsis sequence • Arabidopsis to related crop plants (soybean, tomato, Medicago truncatula) • Determine the degree of locally conserved gene order between these crops and Arabidopsis • Determine how the Arabidopsis sequence can be used in the analysis of these species • Arabidopsis to distant plants (rice for instance) • Gene discovery • Systems analysis • Gene order conservation??? • Arabidopsis to animals • How plants and animals differ in carrying out basic biological processes • How plant and animals organize and manage gene expression

  34. Mammalian Comparative Genomics • Canine vs. Human Genome • Sequence canine ESTs • In collaboration with Elaine Ostrander (FHCRC) map to the dog genome • Map computationally to the human genome • Use to better annotate the human sequence • Starting material for microarrays • Use in gene discovery (behavior and cancer)

  35. myosin, light polypeptide 4, alkali

  36. How will genomics effect the way we do biological research

  37. Rate at which genes can be identified • Cloning - weeks to years • Database searches - seconds to minutes

  38. What are the areas where genome technology will impact us • Diagnostics • Forensics • Understanding of diseases such as cancer at the molecular level • Treatments for diseases customized to the individual

  39. Genomic Information allows us to look at the entire gene content of an organism simultaneously

  40. > 9 of the 10 Leading Causes of Mortality Have Genetic Components • 1. Heart disease (29.5% of deaths in ‘00) • 2. Cancer (22.9%) • 3. Cerebrovascular diseases (6.9%) • 4. Chronic lower respiratory dis. (5.1%) • 5. Injury (3.9%) • 6. Diabetes (2.9%) • 7. Pneumonia/Influenza (2.8%) • 8. Alzheimer disease (2.0%) • 9. Kidney disease (1.6%) • 10. Septicemia (1.3%)

  41. Genomic Health Care • About conditions partly: • Caused by mutation(s) in gene(s) • e.g., breast cancer, colon cancer, autism, atherosclerosis, inflammatory bowel disease, diabetes, Alzheimer disease, mood disorders, etc., etc. • Prevented by mutation(s) in gene(s) • e.g., HIV (CCR5), ?atherosclerosis, ?cancers, ?diabetes , etc., etc.

  42. Genomic Health Care • Will change health care by... • Creating a fundamental understanding of the biology of many diseases (and disabilities), even many “non-genetic” ones • Helping to redefine illnesses by etiology rather than by symptomatology

  43. Genomic Health Care • Knowledge of individual genetic predispositions will allow: • Individualized screening • Individualized behavior changes • Presymptomatic medical therapies, e.g., antihypertensive agents before hypertension develops, anti-mood disorder agents before mood disorder occurs

  44. Crystal Ball - 2010 • Predictive genetic tests for 10 - 25 conditions • Intervention to reduce risk for many of them • Gene therapy for a few conditions • Primary care providers begin to practice genetic medicine • Preimplantation diagnosis widely available, limits fiercely debated • Effective legislative solutions to genetic discrimination & privacy in place in US • Access remains inequitable, especially in developing world

More Related