1 / 1

GENOMIC COLOCATION: A NEW OPTION IN THE STRATEGIES WDK TO COMBINE RESULT SETS

GENOMIC COLOCATION: A NEW OPTION IN THE STRATEGIES WDK TO COMBINE RESULT SETS USING RELATIVE GENOMIC LOCATIONS

yazid
Télécharger la présentation

GENOMIC COLOCATION: A NEW OPTION IN THE STRATEGIES WDK TO COMBINE RESULT SETS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GENOMIC COLOCATION: A NEW OPTION IN THE STRATEGIES WDK TO COMBINE RESULT SETS USING RELATIVE GENOMIC LOCATIONS Cristina Aurrecoechea1, Brian P. Brunk2, Steve Fischer2, Xin Gao2, Omar S. Harb2, Mark Heiges1, Jessica C. Kissinger1, Eileen T. Kraemer1, Cary Pennington1, David S. Roos2, Christian J. Stoeckert2, Charles Treatman2 & Susanne Warrenfeltz1 1Univ. Georgia, Athens GA, & 2Univ. Pennsylvania, Philadelphia PA The EuPathDB Strategies Web Development Kit (Strategies WDK) is a search system and graphical interface for integrative genomics databases that helps users perform dynamic in silico experiments. A search strategy is built up from individual searches using a graphical display that illustrates how searches are combined and facilitates revising individual steps, with changes propagated forward through the strategy. The output of a strategy might be a set of genes, SNPs, clinical or field isolates or any other data type in the database. We recently added a genomic colocation option to combine the results of two consecutive steps in a strategy based on their members’ relative genomic locations. The participating steps must contain features that map uniquely to the genome. Supported operators are overlap and contain. For example, in a genes strategy a user may now add a search for a DNA motif and specify that a motif must overlap within a region 500bp upstream of a gene, thus retaining only genes that have this motif in their promoter region. Other use cases include identifying SNPs upstream of protein coding genes that are differentially expressed in two different strains, or identifying divergently transcribed genes that appear on opposite strands with overlapping upstream regions. (For a strategy on genes with an AP-2 like motif see http://plasmodb.org/plasmo/im.do?s=0ffa670cc2b0a579.) The development of the genomic colocation user interface was challenging, as we will present, involving an iterative process that included usability studies. The GUI guides users through specification of a search region in each of the two result sets to be combined, and the operator that applies to them (overlap, contain). Each selected region is a configurable interval upstream, downstream or arbitrarily located with respect to the genomic features in the set, and includes DNA strand orientation. The user also specifies from which input result set to draw the final results. For example, in a genomic colocation operation that combines genes and SNPs the user may select to return either genes or SNPs. Members of the chosen set are returned if their specified region relates to any specified region in the other set according to the chosen operator. The Strategies WDK system is schema independent and available for download and installation at http://code.google.com/p/strategies-wdk. Multiple versions The EuPathDB suite of databases covers genomic and functional genomics datasets for a variety of eukaryotic pathogens. Did you ever wish you could intersect different genomic data sets on the basis of their genomic location? For example, find annotated genes where there is a specific DNA motif within 500 base pairs of the gene’s upstream region. Upstream regions The goal was to develop a mechanism to identify features based on their relative genomic coordinates (genomic colocation). Specifically, given two sets of features with uniquely defined coordinates on a genomic sequence (e.g., genes, SNPs, motifs, Sage Tags, ORFs )….. determine which feature pairs “comply” with a user-defined relative distance from each other in the genome, and a relative strandness. 500 bp genes (e) (d) (c) Genome motifs Result set (a) (b) The challenge was to design an intuitive user interface for combining feature sets based on genomic colocation. This involved multiple iterations with input from the EuPathDB user community. A statement that can be modified using drop down menus. • For each set we define a region relative to each feature: • (a) genes: 500bp upstream • (b) motifs: exact region (d) Define the “strandness” relationship, for a pair of features to be “compliant”. (c) Next we define how these regions should relate. We want the motifs region contained within the gene upstream region. (e) Select from which set you want the “compliant” features in your result. genes The final version of the colocation user interface involves building a logical genomic colocation statement. gene regions motif regions motifs Identify genes that may be co-regulated by shared promoter elements. The genes should be located within 1000bp of each other, be divergently transcribed and be expressed maximally at day 30 of the iRBC cycle +-8hrs and show at least a 4 fold increase in expression. Colocation enhances data integration via our search strategy system 3 Let’s specify the colocation for the genes we are looking for! 1 Search for the expressed genes. 2 Add the same set of genes as a second step in your strategy. Select the colocation operation. • Color coded sets information (blue and red). • Instantaneous graphical feedback on regions selected. • Instantaneous graphical feedback on regions relationship. • Instantaneous feedback on what features will be returned. 4 EuPathDB is an NIAID Bioinformatics Resource Center supported by NIAID Contract No. HHSN266200400037C and The Bill & Melinda Gates Foundation We turn on the “Pf-iRBC expression profile graph (GS array)” column to assess how well the pairs of genes compare in terms of expression.

More Related