Comprehensive SNV Analysis Pipeline Using GATK and ANNOVAR for Variant Discovery

Illu_SNV_analysis_Pipeline Shiyi.Z

Diagram of analysis process http://www.broadinstitute.org/gatk/guide/best-practices

Data Pre-processing NGSQCToolkit filters raw data, and generates QC report Bowtie2 map filtered data to reference, samtools convert and make duplicates GATK realignment INDEL around sequencing data Using GATK and Samtools do variant and INDEL calling -o sample.gatk.raw1.vcf -o sample.samtools.raw1.vcf Consolidate and Filter the variant -o sample.concordance.raw1.vcf Filter: QD < 20; ReadPosRankSum < -8; FS > 10; QUAL < $MEANQUAL -o sample.concordance.flt1.vcf Correct aligned file based on filtered variant report -o sample.recal.bam

Variant Discovery Patient: patient.concordance.filter1.vcf Father: father.concordance.filter1.vcf Mother: mother. concordance.filter1.vcf Control: control.concordance.filter1.vcf Based on previous *.filter1.vcf variant file, correct aligned file, and generate sample.recal.bam file Using GATK and Samtools recall variant again, and generate sample.final.vcf files

Preliminary Analysis Submit VCF file to wANNOVARwebsite (http://wannovar.usc.edu/) Do annotation using ANNOVAR Variation prioritization Prioritization by ANNOVAR annotate_variation.pl -filter --dbtype generic --genericdbfile hg18_avsift.txt --score_threshold 0.05 ex1.human humandb/ Using Excel Open the file in Excel 2007 (select "tab-delimited" when opening the file). Click the "DATA" tab at the menu bar, then click the big "Filter" button. Then click any one of the headings such as 1000G_CEU or SIFT to filter out variants, essentially by clicking the check boxes. For SIFT score, make sure to use "less than 0.05 OR equal to (blank)" so that variants without SIFT score do not get filtered out. It should be straightfoward to do, but it may need a little practice for users not familiar with Excel.

ANNOVAR analysis pipeline First: remove variations detected in Control Second: Genetic mode analysis Third: filtered by those parameters: (SIFT less than 0.05; PolyPhen2_HDIV greater than 0.909; PolyPhen2_HVAR greater than 0.909).

Comprehensive SNV Analysis Pipeline Using GATK and ANNOVAR for Variant Discovery

Comprehensive SNV Analysis Pipeline Using GATK and ANNOVAR for Variant Discovery

Presentation Transcript