1 / 37

High Throughput Sequencing

High Throughput Sequencing. Agenda. Introduction to sequencing Applications Bioinformatics analysis pipelines What should you ask yourself before planning the experiment. Introduction to sequencing. What is sequencing?. Finding the sequence of a DNA/ RNA molecule.

stu
Télécharger la présentation

High Throughput Sequencing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. High Throughput Sequencing

  2. Agenda • Introduction to sequencing • Applications • Bioinformatics analysis pipelines • What should you ask yourself beforeplanning the experiment

  3. Introduction to sequencing

  4. What is sequencing? Finding the sequence of a DNA/ RNA molecule What can we sequence? http://cancergenome.nih.gov/newsevents/multimedialibrary/images/CancerBiology

  5. Sanger sequencing • Up to 1,000 bases molecule • One molecule at a time • Widely used from 1970-2000 • First human genome draft was based on Sanger sequencing • Still in use for single molecules http://www.genomebc.ca/education/articles/sequencing/

  6. High Throughput Sequencing Next Generation Sequencing (NGS) / Massively parallel sequencing • Sequencing millions of molecules in parallel • Do not need prior knowledge of what you’re sequencing We will discuss Illumina’s platform only

  7. Sequencing Workflow • Extract tissue cells • Extract DNA/RNA from cells • Sample preparation for sequencing • Sequencing • Bioinformatics analysis Why is it important to understand the “wet lab” part?

  8. Sample Prep Random shearing of the DNA Adding adaptors and barcodes Size selection Amplification Sequencing

  9. Sequencing process

  10. Sequencing process Leave only sequences from one direction

  11. Sequencing process

  12. Sequencing process

  13. Sequencing process

  14. Applications

  15. DNA sequencing • Resequencing – sequencing the genome of an organism with a known genome • Exome sequencing / Targeted sequencing– sequencing only selected regions from the genome • De-novo sequencing– sequencing the genome of an organism with a unknown genome

  16. RNA-Seq Sequencing of mRNA extracted form the cell to get an estimate of expression levels of genes.

  17. Counting vs. Reading RNA-Seqvs. DNA-sequencing

  18. ChIP-Seq Sequencing the regions in the genome to which a protein binds to.

  19. Basic concepts Insert – the DNA fragment that is used for sequencing. Read – the part of the insert that is sequenced. Single Read (SR) – a sequencing procedure by which the insert is sequenced from one end only. Paired End (PE) – a sequencing procedure by which the insert is sequenced from both ends.

  20. Bioinformatics Analysis Pipelines

  21. Demultiplexing • Unknown inserts Lane

  22. Demultiplexing • Mapping Sample • Example of mapping parameters: • Number of mismatches per read • Scores for mismatch or gaps • Mapping parameters affect the rest of the analysis Reference Genome

  23. Demultiplexing • Mapping • Removing duplicates and non-unique mappings ? Reference Genome Reference Genome

  24. Resequencing/ Exome Pipeline

  25. Demultiplexing • Mapping • Removing duplicates and non-unique mappings Coverage profile and variant calling G Reference Genome …ACTTCGTCGAAAGG…

  26. Demultiplexing • Mapping Frequency >= 20% • Removing duplicates and non-unique mappings Coverage profile and variant calling • Variant filtering …ACTTCGTCGAAAGG… Coverage >= 5 Reference Genome Reference Genome …ACTTCGTCGAAAGG…

  27. Demultiplexing • Mapping • Removing duplicates and non-unique mappings • Variant calling • Variant filtering G A • Genes and known variants Reference Genome …ACTTCGTCGAAATG… …GTCCCGTGATACTCCGT… Gene X rs230985

  28. Resequencing results

  29. Demultiplexing Example for further analysis Recessive disease: Variant not in known databases Homozygous variant shared by all affected individuals Same variant appears in healthy parents at heterozygous state Healthy brothers can be heterozygous to the same variant • Mapping • Removing duplicates and non-unique mappings Coverage profile and variant calling • Variant filtering Dominant disease: Variant not in known databases Heterozygous variant shared by all affected individuals The variant doesn’t appear in healthy individuals • Genes and known variants • Finding suspicious variants 1 Table per project 1 Table per sample

  30. Demultiplexing Quality control steps in the pipeline • Mapping • Removing duplicates and non-unique mappings QC QC QC QC QC Coverage profile and variant calling • Variant filtering • Genes and known variants • Finding suspicious variants

  31. How is de-novo assembly different from resequencinganalysis ?

  32. RNA-Seq Pipeline

  33. Demultiplexing Unannotated genes • Mapping • Removing duplicates and non-unique mappings FPKM Normalization: • Gene expression levels

  34. Demultiplexing • Mapping • Removing duplicates and non-unique mappings • Differential expression parameters: • Threshold - Minimum number of reads for pair testing • Normalization • Replicates • Differential expression parameters affect the results • Gene expression levels • Differential gene expression

  35. RNA-Seq results

  36. Coverage Coverage – resequencing Number of bases that cover each base in the genome in average. • “Coverage” – RNA-Seq • Depends on the expression profile of each sample. • Highly expressed genes will be detected with less “coverage” than lowly expressed genes.

  37. What should you ask yourself before sequencing whenplanning the experiment • Reference genome: • What is my reference genome? • Does it have updated annotations? • What annotations are known? • Are my samples closely related to the reference genome? • Do I expect to have contaminations in my sample? • Do I have validations from other technologies? (RT-PCR, SNPchip…) • Do I have controls and replicates? • RNA-Seq: am I interested in alternative splicing? • Resequencing: What kind of mutations do I expect to find?

More Related