70 likes | 214 Vues
This document outlines a streamlined analysis pipeline for Single Nucleotide Variant (SNV) detection and annotation using GATK and ANNOVAR. It covers pre-processing steps such as quality control with NGS QC Toolkit, mapping with Bowtie2, and variant calling with GATK and Samtools. The pipeline emphasizes filtering variants based on specific criteria, consolidation of calls, and correction of aligned files. Finally, it elaborates on using ANNOVAR for annotation and prioritization of genetic variations, along with practical guidance on analyzing results.
E N D
Illu_SNV_analysis_Pipeline Shiyi.Z
Diagram of analysis process http://www.broadinstitute.org/gatk/guide/best-practices
Data Pre-processing NGSQCToolkit filters raw data, and generates QC report Bowtie2 map filtered data to reference, samtools convert and make duplicates GATK realignment INDEL around sequencing data Using GATK and Samtools do variant and INDEL calling -o sample.gatk.raw1.vcf -o sample.samtools.raw1.vcf Consolidate and Filter the variant -o sample.concordance.raw1.vcf Filter: QD < 20; ReadPosRankSum < -8; FS > 10; QUAL < $MEANQUAL -o sample.concordance.flt1.vcf Correct aligned file based on filtered variant report -o sample.recal.bam
Variant Discovery Patient: patient.concordance.filter1.vcf Father: father.concordance.filter1.vcf Mother: mother. concordance.filter1.vcf Control: control.concordance.filter1.vcf Based on previous *.filter1.vcf variant file, correct aligned file, and generate sample.recal.bam file Using GATK and Samtools recall variant again, and generate sample.final.vcf files
Preliminary Analysis Submit VCF file to wANNOVARwebsite (http://wannovar.usc.edu/) Do annotation using ANNOVAR Variation prioritization Prioritization by ANNOVAR annotate_variation.pl -filter --dbtype generic --genericdbfile hg18_avsift.txt --score_threshold 0.05 ex1.human humandb/ Using Excel Open the file in Excel 2007 (select "tab-delimited" when opening the file). Click the "DATA" tab at the menu bar, then click the big "Filter" button. Then click any one of the headings such as 1000G_CEU or SIFT to filter out variants, essentially by clicking the check boxes. For SIFT score, make sure to use "less than 0.05 OR equal to (blank)" so that variants without SIFT score do not get filtered out. It should be straightfoward to do, but it may need a little practice for users not familiar with Excel.
ANNOVAR analysis pipeline First: remove variations detected in Control Second: Genetic mode analysis Third: filtered by those parameters: (SIFT less than 0.05; PolyPhen2_HDIV greater than 0.909; PolyPhen2_HVAR greater than 0.909).