1 / 24

The 1000 Genomes Project

The 1000 Genomes Project. Wellcome Trust Advanced Course: Exome Sequencing Exercise Answers. Exercises : Finding Data. Find the highly differentiated sites excel spreadsheet from the Phase 1 analysis on the ftp site .

lapis
Télécharger la présentation

The 1000 Genomes Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The 1000 Genomes Project Wellcome Trust Advanced Course: Exome Sequencing Exercise Answers

  2. Exercises : Finding Data Find the highly differentiated sites excel spreadsheet from the Phase 1 analysis on the ftp site. How many highly differentiated sites are in the section of chromosome 22 you are considering:22:25900000-29600000

  3. Answers : Finding Data 1. Find the highly differentiated sites excel spreadsheet from the Phase 1 analysis on the ftp site.

  4. Answers : Finding Data 1. Find the highly differentiated sites excel spreadsheet from the Phase 1 analysis on the ftp site. grepthe current.tree file wgetftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/current.tree grep highly current.tree ftp/phase1/analysis_results/supporting/highly_differentiated_sites directory 53 Wed Jul 4 16:02:54 2012 ftp/phase1/analysis_results/supporting/highly_differentiated_sites/20120703_high_diff_sites_table.xlsx file 334559 Wed Jul 4 15:58:12 2012 d99037f1e36eb137733b60a5c5cb32ed

  5. Answers : Finding Data 2. How many highly differentiated sites are in the section of chromosome 22 you are considering: 22:25900000-29600000 7

  6. Exercises : Meta Data and File Formats What population is the individual HG00611 from? How many runs of low coverage sequence data is associated with the Individual HG00611 for the final phase3 analysis? Which center performed the sequencing? How many basepairs were sequenced?

  7. Answers : Meta Data and File Formats What population is the individual HG00611 from? wgetftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/analysis.sequence.index grepHG00611analysis.sequence.index| cut -f11 | sort -u CHS

  8. Answers : Meta Data and File Formats How many runs of low coverage sequence data is associated with the Individual HG00611 for the final phase3 analysis?. Please note the analysis.sequence.index is the file which describes the phase3 input. grepHG00611analysis.sequence.index| grep"low coverage" | cut -f3,6,10,11 | sort -u ERR012626 SC HG00611 CHS ERR018551 SC HG00611 CHS ERR019500 SC HG00611 CHS 3

  9. Answers : Meta Data and File Formats Which center performed the sequencing? SC How many basepairs were sequenced? grepHG00611analysis.sequence.index| grep"low coverage" | cut -f25 | awk'{ sum+=$1} END {print sum}' 12592744056

  10. Exercises : Browser Find the gene MN1 in the 1000genomes browser: http://browser.1000genomes.org What phenotype is associated with the gene MN1? Are there any other locations in the genome with associations to Familial Meningioma?

  11. Answers : Browser Find the gene MN1 in the 1000genomes browser: http://browser.1000genomes.org

  12. Answers : Browser Familial Meningioma What phenotype is associated with the gene MN1?

  13. Answers : Browser Are there any other locations in the genome with associations to Familial Meningioma? If you click on the view all locations link in the phenotype table you can see a karyotype diagram that shows all locations on the genome this phenotype is associated with. In this instance there are no other known associations.

  14. Exercises : Tools Use tabix on the command line to get a section of the phase1 integrated variants with genotypes for the gene MN1. Use the vcf to ped converter script to generate map and info files for exon 1 of MN1 for CHS Look at these files in haploview and find the longest haplotype block with the standard model

  15. Answers : Tools Use tabix on the command line to get a section of the phase1 integrated variants with genotypes for the gene MN1. tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.chr22.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz 22:28144265-28197486 | bgzip -c > MN1.vcf.gz

  16. Answers : Tools Use the vcf to ped converter script to generate map and info files for exon 1 of MN1 for CHS perlvcf_to_ped_convert.pl-vcf ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/ALL.chr22.integrated_phase1_v3.20101123.snps_indels_svs.genotypes.vcf.gz -region 22:28144265-28197486 -sample_panel ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/integrated_call_sets/integrated_call_samples.20101123.ALL.panel -population CHS

  17. Answers : Tools Look at these files in haploview and find the longest haplotype block with the standard model java –jar Haploview.jar

  18. Answers : Tools Block 2, 5KB Look at these files in haploview and find the longest haplotype block with the standard model?

  19. Exercises : Ensembl Variation Find the variant rs6005451? In which sub populations is this variant’s non reference allele most common? What Phenotype Association does this variant have? Other variants also have this phenotype association, which variant has the most confident association? Take the vcf file you generated for the MN1 gene earlier in exercise 10 and discover the functional consequences with the variant effect predictor. How many deleterious missense mutations are found according to SIFT?

  20. Answers : Ensembl Variation Find the variant rs6005451?

  21. Answers : Ensembl Variation In which sub populations is this variant’s non reference allele most common? The snpview page has a population genetics item in the lefthand menu. This gives you both pie charts and a table with allele frequencies in it. In this instance TSI (Toscan) has the highest allele frequency

  22. Answers : Ensembl Variation What Phenotype Association does this variant have? Again the lefthand menu has a phenotype entry. This gives a table listing any phenotype associations. In this instance rs6005451 is associated with a prostate cancer associated gene/gene interaction.

  23. Answers : Ensembl Variation Other variants also have this phenotype association, which variant has the most confident association? If you follow the view on karyotype link you get taken to a view where all the links are displayed on the karyotype and a table of all the associated variants is givens. In this instance the variation with the strongest association is rs784411 with a negative log P value of 7.

  24. Answers : Ensembl Variation Take the vcf file you generated for the MN1 gene earlier in exercise 10 and discover the functional consequences with the variant effect predictor. perlvariant_effect_predictor.pl-input MN1.vcf.gz-output MN1_vep_out -offline -sift=p –dir/wtac/home/wtacXX/wt_advanced_exome/g1k_code/variant_effect_predictor/ How many deleterious missense mutations are found according to SIFT? There are 7 deleterious missense mutations according to SIFT. grep deleterious MN1_vep_out | cut -f1,2,14 rs45589739 22:28192982 SIFT=deleterious rs201955277 22:28193458 SIFT=deleterious rs181493013 22:28193945 SIFT=deleterious rs184710088 22:28194628 SIFT=deleterious rs45473596 22:28194749 SIFT=deleterious rs200805240 22:28196248 SIFT=deleterious rs200030766 22:28196476 SIFT=deleterious

More Related