Cloning DNA fragments

IB404 - 2. Cloning and Sequencing - Jan 23 Cloning DNA fragments Invented by Herbert Boyer (UCSF) and Stanley Cohen (Stanford) in the early 1970s. Boyer founded Genentech and Stanford eventually ran out of patent protection for the principles of cloning DNA. Herbert Boyer Stanley Cohen

Plasmids 1. Small circular double-stranded DNA molecules. 2. In bacteria, usually encoding antibiotic resistance proteins. 3. Some mediate conjugation in bacteria as a means of spreading. 4. pBR322 was original one used for cloning because it had: a. Ampicillin and tetracycline resistance. b. No ability to move by conjugation, hence contained. c. Origin of replication causing many copies per cell. 5. Many thousands of variants based on this.

Other cloning vectors for genome projects 1. Plasmids - clone pieces ranging from 100bp-10,000bp. 2. Cosmids - utilize the cos sites of lambda phage to package 10-50,000bp pieces for introduction into bacteria - then replicate as plasmids. 3. BACs - Bacterial Artificial Chromosomes - can obtain clones of 50-300,000bp in bacteria, but low copy number.

4. YACs - Yeast Artificial Chromosomes - linear molecules with yeast centromere and two telomeres, and insert inbetween. Can be 100-500,000bp, but subject to rearrangement in yeast cells, so can be suspect (mostly for nematode and human).

DNA sequencing - 1975 Chemical degradation method by Walter Gilbert at Harvard (left). Enzymatic extension or polymerase or termination or dideoxynucleotide method by Fred Sanger in Cambridge, England (right).

Sanger dideoxy sequencing

Old manual radioactive method 1. Label primer or a nucleotide with S35 so will only visualize the new strands. 2. Denature newly synthesized and template strand DNA to single-stranded DNA. 3. Run the four reactions in separate lanes on a thin long polyacrylamide gel. 4.Concentrated polyacrylamide allows resolution of DNA strands 1base different in length. 5. Transfer gel to a paper backing, dry, and expose X-ray film overnight to obtain an autoradiograph. 6. Read bases from the bottom (shortest fragments) to top. 7. In the early 1990s I generated ~400,000bp this way from transposons in insects. A C G T

Automated Sanger sequencing 1. Leroy Hood at Caltech invented method, and Applied Biosystems Inc (ABI) comm. 2. Use fluorescent labels on primers and run products off end of gel as machine scans. 3. ABI added the twist of four different fluorescent labels at different wave- lengths, one for each dideoxynucleotide, so could run four reactions in one lane. 4. Finally, ABI and others introduced capillary machines up to 99 capillaries per machine, capable of reading up to 1000 bases per run in 2 hours. 5. Output of such a machine is 1000X100X10 or 1 million bases per day. This is the basic technology used since mid-1990s to sequence most genomes we will consider.

Massively-parallel pyrosequencing (454/ROCHE). The first of the next-generation or second-generation sequencing methods. The most sophisticated commercial application is from 454 Life Sciences, bought out by Roche, and combines nanotechnology to isolate and amplify single DNA molecules and then pyrosequencing. There are many advantages, including no need to clone and hence bias samples, and it works well for entire bacterial genomes, and for fungi, and recently for some insects, e.g. ants, a moth, a fruit fly, etc. One or two human genomes as well. The nanotechnology part is responsible for the massively-parallel nature of this method, in which fragments of DNA are mixed at low concentration with 40 micrometer diameter beads that bind them, roughly 1 molecule per bead (beads without a DNA molecule yield nothing and those with 2 or 3 yield a mess, which is ignored). The beads are individually enveloped in a tiny droplet of PCR reagents in an emulsion and the single DNA fragments amplified a million times. Then beads are individually trapped in tiny cavities in a plate - the current density being 3 million cavities in a 75X75mm plate, of which only about 30% are occupied with DNA-bearing beads, allowing about 1 million sequencing reactions per plate at ~400 bp each yields ~400 Mb per run (5 hours). The beads are held in place by layers of tinier beads containing the sequencing polymerase and reagents. Nucleotides are washed over the plate sequentially, and as they are incorporated the released pyrophosphate is detected by using it to phosphorylate AMP to ATP, which is detected using firefly luciferase and the substrate luciferin. A sensitive CCD camera monitors the chemiluminescent light.

454 pyrosequencing (First version, so “only” 200,000 reads each time, and short ones. In third version, 1m reads, ~400 b per read) In fourth version today, longer reads to ~700b

The visual output is called a flowgram (because one “flows” the nucleotides sequentially across the sequencing plate). A single flowgram from a single bead in a single well representing a single fragment of DNA is shown below. Starting from the left, the first red peak indicates that when T nucleotides were flowed across the plate a burst of light was detected from this well, representing addition of a single T to each of the ~1 million identical DNA fragments on that bead in that well. Then A nucleotides were flowed across the plate, but there is no signal from this well, so the next nucleotide is NOT an A. Then C was flowed, and a single C was detected, then G but there is no G, then T again, etc. When there are two nucleotides in a row in the template, they will be added in quick succession, leading to a double-intensity flash of light, hence a 2-mer, and so on for 3- and 4-mers. Sadly, this technology gets a little messy when you have long homopolymer strings.

Massively parallel sequencing-by-synthesis (SOLEXA/ILLUMINA) A second type of next-generation nanotechnology-based DNA sequencing was introduced by SOLEXA (bought out by ILLUMINA), while ABI does something similar with their ABI-SOLID machine, trying to stay in the DNA sequencing game. Like 454 this technology depends on starting with single DNA fragments, which are fixed to a glass substrate, and PCR amplified in place (forming clusters or polonies), which are then sequenced using chemical synthesis methods and fluorescently labelled nucleotides and high resolution CCD cameras. The details are a bit too much for us, but you can see some of them in schematics from the company here - http://www.illumina.com/pages.ilmn?ID=203. Some details are worthwhile. First, their machine can run 8 samples at a time, each in a different lane of a flow cell, which is basically an enclosed microscope slide. Second, ~30 million unique polonies are generated per lane. Third, in their original machine they generated ~35 base reads per fragment, their next machine did ~75 base reads, and their 3rd machine available now generates ~100 base reads. Fourth, each run of the current HiSeq2000 machine takes a week, but it generates 30X100X8 million bases or roughly 25 Gbp!

Cloning DNA fragments