Bioinformatics Tools and Techniques: Exploring Genomic Data Analysis

Bioinformatics tools and techniquesInto the heart of darkness Elaine Kenny Colm O’Dushlaine 15/11/07

Summary • Simple overviews of some of the tools and methods used by EK and CO’D • TK notebook • get_hapmap_snps.pl: retrieve HM genotype information for a list of SNPs • GeneViewer.pl & cross_ref.pl: visualise e.g. SNPs in the context of other genomic landmarks. Score SNPs depending on how many of these landmarks they overlap with • ld_expander.pl: find SNPs in LD with SNPs of interest, based on user-specified r2 and “LD window” (distance between SNPs) • STATA • VIM: command line text editor • Lab website

TK notebook • Application for saving notes, to-do lists, daily logs, and any other kind of textual information in a place where you can find it all again, and where related information is easily found • Easy to edit and rapidly searchable • DEMO – editing • DEMO – search

get_hapmap_snps.pl • Simple script to read in a 1-column list of SNPs and retrieve HapMap genotypes • Can select population and strand • DEMO • Retrieved data can be loaded into HaploView • DEMO

cross_ref_scored.pl • Score SNPs based on how many putatively functional regions they overlap with: • On a per gene / chromosome basis • Gene basis: • Type: perl cross_ref_scored.pl file_A file_B file_C ... where file_A - 2-column file of SNPs (format = id, location) file_B - 3-column file of EXONS (format = id/name, start, stop) file_C ... - whatever you want, (format = id/name, start, stop) i.e. other regions like CpGs, TFBS, clusters. Any order. …

cross_ref_scored.pl example output: Can then be merged with HapMap / Perlegen to retrieve MAF data for SNPs

Merge cross_ref_scored data with HapMap/ Perlegen data using merge_per_hap.pl • Type: perl merge_per_hap.pl perlegen.txt hapmap.txt overlapped_region_scored.txt • Where: hapmap.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq), perlegen.txt = 3-column file (format: rsid, ref_allele, ref_allele_freq)

cross_ref.pl applied to WGA data • cross_ref.pl: Scoring SNPs throughout genome • Data analysed on coding/non-coding basis (coding) • perl cross_ref.plOverlapped_regions_scored.WTCCC.chr22.coding.txt 22WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/coding_non_synon_SNPs_UCSC.clean=3 WGA_databases/coding_synon_SNPs_UCSC.clean=2 WGA_databases/RefSeq_Genes_UCSC.byExon.uniqid=1 WGA_databases/Triplexes_may2006.bed=2 WGA_databases/splice_site_SNPs_UCSC.clean=2>Overlapped_regions_scored.WTCCC.chr22.coding.log & (input-dependent,coding/non-coding dependent, arbitrary) (noncoding) • perl cross_ref.pl Overlapped_regions_scored.WTCCC.chr22.NONcoding.txt 22 WTCCC_T2D_chr22_without_inferred.forCrossRef WGA_databases/TFBS.chr22=1 WGA_databases/CpG_islands_UCSC.uniqid=1 WGA_databases/Most_conserved_phastConsElements17way_UCSC.clean=1 WGA_databases/promoters_knowngene_hg18.txt=1 WGA_databases/sno_or_miRNA_UCSC.uniqid=1 > Overlapped_regions_scored.WTCCC.chr22.NONcoding.log &

cross_ref.pl • cross_ref.pl output: • Load into STATA. If SNPs have e.g. association p-values, calculate adjusted p-value (R. Anney) as -log10[P] + [cross_ref_score]

GeneViewer.pl • GeneViewer.pl: Visualise overlapping features (e.g. exons, SNPs etc.) along e.g. your gene of interest (html output)

ld_expander.pl • Find proxies (SNPs in LD) for a list of SNPs • User specifies the r2 and “LD window” • Currently configured to obtain proxies from HM CEU • Result is a list of additional proxy SNPs that have been obtained by LD expansion • DEMO • Note: don’t LD expand >150000 SNPs, or HapMap will ban you! CO’D has an alternative version that uses local pre-computed pairwise LD SNP files

STATA • Extremely powerful and flexible • >65k rows handled – shock horror! • Can write scripts to automate tasks, e.g. read in file, do analysis, save results • When use GUI to run some commands, the commands are shown in the command window, so can save in a do file • CO’D, EK and R. Anney strongly advocate this as a platform for both file manipulation and statistical analysis

http://www.wtccc.org.uk/ STATA example using WTCCC data Bipolar Disorder, Coronary Artery Disease, Crohn's Disease, Hypertension, Rheumatoid Arthritis, Type 1 Diabetes, Type 2 Diabetes

DATA FORMAT • 3 folders: • Basic • Each case collection against the pooled control groups 58C and UKBS • Combined cases • Combining other case collections as controls • Combined controls • Combining phenotypically relevant case collections (e.g. RA/T1D, autoimmune ) • Data are split by chromosome

Questions • How do I get all of the chromosome data for my gene of interest into one file? • How do I search easily all of the SNP information for my gene(s) of interest? • Create a “.do” file for all manipulations that you want to carry out to the data • DEMO • Good starting resource: http://www.ats.ucla.edu/stat/stata/

VIM • “Vi Improved”. Mainly UNIX but cross-platform text editor (available for Windows). • Full list of commands outside scope of this demonstration • Very fast and efficient, esp. with search and replace functions on large datasets • Regular expression pattern matching • DEMO • Integrates with Cygwin (www.cygwin.com – very useful UNIX emulator for windows)

Group website • Some useful stuff up there! • Please send information about current projects etc. Good for our image as a group and minimal effort required on your part • DEMO

Conclusions • Small summary of some things you can do • Slides and video demonstrations will be online at: http://www.medicine.tcd.ie/psychiatry/research/neuropsychiatry/Protocols/ • CO’D & EK available for advice(Friday’s 9-9.02am) • These things will help you in your work!!

Bioinformatics Tools and Techniques: Exploring Genomic Data Analysis

Bioinformatics Tools and Techniques: Exploring Genomic Data Analysis

Presentation Transcript

HEART OF DARKNESS

Heart of Darkness

Heart of Darkness

THE HEART OF DARKNESS

Heart of Darkness

Heart of Darkness

Heart of Darkness

Heart of Darkness

Heart of Darkness

Heart of Darkness

Heart of Darkness

The Heart of Darkness:

Heart of Darkness

Heart of Darkness

Heart of Darkness

Heart of Darkness

Heart of Darkness

“Heart of Darkness”

Heart of Darkness