1 / 22

Genotyping-by- Sequencing what is it and what is it good for ?

Genotyping-by- Sequencing what is it and what is it good for ?. Keith R. Merrill NCSU – Crop Science. GBS vs. RAD-Seq The ultimate throw down! (of acronyms). GBS: Genotyping-by-Sequencing RAD-Seq: Restriction-site associated DNA sequencing. GBS vs. RAD-Seq What’s the Difference?.

carver
Télécharger la présentation

Genotyping-by- Sequencing what is it and what is it good for ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Genotyping-by-Sequencing what is it and what is it good for? Keith R. Merrill NCSU – Crop Science

  2. GBS vs. RAD-SeqThe ultimate throw down! (of acronyms) GBS: Genotyping-by-Sequencing RAD-Seq: Restriction-site associated DNA sequencing

  3. GBS vs. RAD-SeqWhat’s the Difference?

  4. The Concept Assign sequences to individuals Call Variants between individuals Sequence Combined Pool Reduce the Genome Pool Samples

  5. The Concept It’s all about probability Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n

  6. The Concept Reduce the genome and increase the probability of overlap Ind_1 Ind_2 Ind_3 Ind_4 Ind_5 Ind_n

  7. How it works Tags (AKA Barcodes, MID Barcodes, etc.) Tag1 Ind1 = GGATA Tag2 Ind2 = CACCA Tag3 Ind3 = CAGATA Tag4 Ind4 = GAAGTG Tag5 Ind5 = TAGCGGAT TagN IndN = …

  8. How it works(The One Enzyme Method) Tag1 Tag1 Tag1 Ind1 Tag2 Tag2 Tag2 Ind2 Tag3 Tag3 Tag3 Ind3 Tag4 Tag4 Tag4 Ind4 Tag5 Tag5 Tag5 Ind5 TagN TagN TagN IndN

  9. How it works(The Two Enzyme Method)

  10. How it worksSize Selection Base-pair range selected

  11. How it worksPooling Tag1 Tag1 Ind1 Ind1 Size Selection (optional if using two-enzymes) Tag2 Tag2 Ind2 Ind2 Tag3 Tag3 Ind3 Ind3 Ind4 Tag4 Tag4 Ind4 Tag5 Tag5 Ind5 Ind5 IndN TagN TagN IndN

  12. Why Pool Samples? • On the Illumina Hi-seq 2000: • 8 lanes of sequencing, each capable of giving 374 million reads. • You can’t partition a lane. • Sequencing is expensive ($1500 - $3000 per lane). • You don’t need/want 374 million reads per individual.

  13. A Word About Tags • Hamming vs. Edit Distance • Sequence errors may result from things other than sequencing. • n-1 errors are the most common error encountered during oligo synthesis.

  14. Analysis It’s about time… and money… and time Key Considerations: • Time • Computing power available • Amount of sequence data (back to time) • Availability of a reference genome

  15. Key Considerations • Study goals • Availability of a reference genome • Expected degree of polymorphism • Choice of restriction enzyme • DNA sample preparation • Adaptor design • PCR amplification • Sequencing • Pooling individuals • Analysis

  16. Analysis It’s about time… and money… and time A Few Options: • Stacks • For use with bi-parental mapping populations • Takes a lot of time • Looks at entire reads • Reference genome optional • Designed to work nicely with MySQL • More memory intensive • UNEAK • For use with species without a reference genome • Uses only 64 bp of each read • MUCH faster than Stacks • Less memory intensive • TASSEL • For use with species with a reference genome • Uses only 64 bp of each read • MUCH faster than Stacks • Less memory Intensive • Custom scripts • Completely flexible (hence the ‘custom’) • Requires significant knowledge about programming (or knowing someone who does and is willing to help)

  17. Does it work? Note: This is with hexaploid wheat and no reference genome

  18. The Good • No ascertainment bias • Random distribution throughout the genome • May be useful for species without a reference genome • Useful with genomic selection • May provide a large number of SNPs • Relatively low per sample cost

  19. The Good (cont) GBS is extremely flexible • Number of individuals per lane/flowcell • Choice of enzymes • Cut sites • Methylation sensitivity • Size of fragments selected

  20. The Bad • Poor reproducibility between runs • Species without a reference genome *cannot* infer missing data • Often dealing with large amounts of missing data • Difficult to filter out false SNPs in non-mapping populations, unless you have a reference genome and even then… • In my opinion: this would be nigh impossible to use with association studies in species without a reference genome UNLESS you sequence to very high coverage to virtually eliminate missing data (alternatively, you could drastically reduce the genome by your choice of enzymes – but this may be bad if your expected degree of polymorphism is low)

  21. Questions?

  22. TASSEL-GBS • www.maizegenetics.net/index.php?option=com_content&task=view&id=89&Itemid=119 • GBS_Document • www.maizegenetics.net/tassel/docs/TasselPipelineGBS.pdf

More Related