
Goals of the Human Genome Project • determine the entire sequence of human DNA • identify all the genes in human DNA • store this information in databases • improve tools for data analysis • transfer related technologies to the private sector • address the ethical, legal, and social issues (ELSI) that may arise from the project.
Sequencing a genome Obtain Genomic DNA Sample Sequence genomic DNA Assemble sequences in order Annotate sequence
Sanger Sequencing Chemical reaction that includes: DNA polymerase DNA primer Nucleotide bases (A, T, G, C) Nucleotide bases that are ‘labeled’ Addition of labeled bases stops reaction. Repeated many times.
DNA separated by size using a gel and an electric current Sequenced sample put in well _ + DNA moves towards positive charge Short DNA moves faster
How do we sequence a genome? For the HGP, two approaches were used: 1. Hierarchical sequencing 2. Shotgun sequencing
How do we put the sequences together in the right order? Genome assembly - based on finding regions of overlap between individual sequencing fragments CCCATTAGATGCGATGGGTTAAAA GGTTAAAAATCGATCCCATTTTACG Very, very difficult problem for complex genomes!!
Genome Annotation Annotation – identifying what part of DNA corresponds to genes, etc. Compare to known genes: • Gene already described and sequenced • Expressed Sequence Tags (EST), essentially randomly sequenced mRNA Predict genes: • Computer predictions
Genome made of two types of DNA • Euchromatic • Comprises 93% of your DNA • Contains most of the genes in your genome • 99% has been sequenced • Heterochromatic DNA • Comprises ~7% of your DNA • Highly repetitive • Some parts are structural: contains centromeres, telomeres • Gene sparse • Very difficult to sequence, largely unexplored.
Euchromatic DNA • 2.8 Billion base pairs • ~30,000 genes • Many fewer than expected, initial guesses were ~100,000 genes • 50% have unknown function • Less than 2% of the total genome • 98% “junk” DNA • Does not code for genes • Function is unknown - but potentially very important!!! • Many (~50%) repeated sequences (e.g. AGAGAGAGAGAG) and transposable elements
What does the draft human genome sequence tell us? How the genome is arranged • Genes occur in gene-dense “jungles” and gene poor “deserts”. • Genes appear to be concentrated in random areas along the genome, with vast expanses of noncoding DNA between. • Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (~231).
HapMap An NIH program to map genetic variation within the human genome • • Begun in 2002 • Construct a map of the patterns of variation that occur across human populations. • • Facilitate the discovery of genes involved in complex human traits and diseases.
Evolutionary Genomics - comparing genomes of different species to learn about genome evolution and function Gene number does not directly scale with complexity of organism!
What do evolutionary comparisons tell us? • How the Human Compares with Other Organisms? • • Humans have 3X as many kinds of proteins as the fly or worm • mRNA transcript "alternative splicing" and chemical modifications to the proteins. • This process can yield different protein products from the same gene. • Large portions of non-genic DNA highly conserved, suggesting the serve some function.