A New Interface to GeneKeyDB

A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng

Protein domains • are distinct units of protein three-dimensionalstructure, which also carry function. • Proteins can be composedof single or multiple domains. • A few thousand conserved domain modelsare sufficient to cover more than two thirds of known proteinsequences. Marchler-Bauer A, et al. CDD: a curated Entrez database of conserved domain alignments. Nucleic Acids Research 31:383-387 (2003) .

The growth of the number of proteins known vs. the growth in the number of unique domains Geer,L.Y., Domrachev,M., Lipman,D.J. and Bryant,S.H. (2002) CDART: Protein Homology by Domain Architecture. Genome Res., 12, 1619–1623.

Conserved Domain Database (CDD): • http://www.ncbi.nlm.nih.gov/Structure/cdd/cdd.shtml • a curated Entrez database of conserved domain alignments at NCBI • currently contains domains derived from two popular collections, Smart and Pfam, plus contributions from colleagues at NCBI, such as COG.

Data generation using GeneKeyDB -- create a master table of associatioship between -- locuslink id and cdd_key CREATE TABLE peng_cddlist as (SELECT a.ll_id, b.ll_refseq_nm_id, c.cdd_key, c.cdd_evalue, a.organism FROM ll_locus a, ll_refseq_nm b, ll_np_cdd c WHERE a.ll_id = b.ll_id and b.ll_refseq_nm_id = c.ll_refseq_nm_id ); commit;

Summary of Data

Looking at groups of domains • We look at a list of cdd domains and return the proteins that are found exclusively in the intersection of those domains. • If a second (third, etc.) list of domains is added, we look at the proteins found exclusively in the intersection of this list, and we combine this with previous lists and do the same.

Looking at groups of domains B A + B A

Options • This can be done using either human or mouse data. • We can turn the exclusivity off, so that we return all proteins in the intersection of the list of cdd keys.

Sample Input and Output Input the first list of domains. The domains should be separated by spaces and should all be on one line. 1 438 (1 438): Input another list of domains separated by spaces (or hit q to quit): 1825 (1825): (1 438 1825): 28992 83666 Input another list of domains separated by spaces (or hit q to quit):

Why useful? A thought 2003

?: log[P(k)] ~ -  k k: the number of CDs per protein

Redundancy in CDD?

Following works: • Remove CDD redundancy • Distribution of the minimal set of proteins across different biological processes/subcellular location (GO terms) • Application in other types of graph with same or different dataset, such genes + TBS

A New Interface to GeneKeyDB

A New Interface to GeneKeyDB

Presentation Transcript

The New User Interface

Zscaler New Interface and Reporting

A Handwriting Interface to Mathematical Expressions

The new CSCOPE Interface

Instant Graphical User Interface Solutions Why you need a new User Interface

Creation and Maintenance of GeneKeyDB

QtROOT a Qt interface to ROOT

4.2 User Interface GUI interface to select a data source, set a parameter.

PyTrilinos: A Python Interface to Trilinos

A simple interface…

A LabVIEW Interface to ELOG

A Protein Interface

An Intelligent Interface to a GIS

perfSONAR MDM updates: New interface, new possibilities

The New User Interface

SML: A New Interface Into Soar

A Speech Interface to Virtual Environment

BLUETOOTH TM :A new radio interface providing ubiquitous connectivity

Code A Wash to POS Interface

Unified Payment Interface A new face of Digital Payment

A Closer Look at the New Google Adwords Interface

SML: A New Interface Into Soar