210 likes | 336 Vues
PhyQL is a web-based visual query interface designed for querying phylogenetic trees efficiently. It supports operations such as selecting subtrees, joining trees, and finding least common ancestors through a simple user interface. By using logical tree query operations, PhyQL allows for easy modifications based on changing logic rules. The proposed system is also adaptable to other data types, such as protein-protein interaction networks. Future enhancements will focus on database interoperability and similarity estimation of trees.
E N D
PhyQL: A Phylogenetic Visual Query Engine Shahriyar Hossain, Munirul Islam, Jesmin, Hasan M Jamil Integration Informatics Laboratory, Computer Science, Wayne State University Department of Genetic Engineering and Biotechnology, University of Dhaka, Bangladesh BIBM 2008 Integration Informatics Research Group
What is a Phylogenetic Tree? Integration Informatics Research Group
Queries: • Least Common Ancestor <root> <node>rayfinned fish</node> <inode> <node>lungfish</node> <inode> <inode> <node>salamanders</node> <node>frogs</node> </inode> . . . </inode> </inode> </root> for $root in doc(“tree.xml")//root return <span> <h1> { $root/node/text() } </h1> </span> Integration Informatics Research Group
Phylogenetic Query Language: Select: select a subset of trees that match a given criteria Join: Join two trees based on a pair of nodes Subset: Subset queries retrieve part of a given tree Integration Informatics Research Group
Tree Join Using Path Operators SubTree Projection Integration Informatics Research Group
PhyQL: Visual Query Interface SELECT JOIN User SUBTREE Translator DB XML /NEXUS From User / Interoperable Databases Wrappers XSB Integration Informatics Research Group
Why XSB? • eliminates left recursion problem Path(X,Z) :- Path(X,Y), Edge(Y,Z) • Stores intermediate results (by tabling method) • Model-based (order of writing rules doesn’t matter) Path(X,Y) :- edge(X,Y) Path(X,Y) :- Path(X,Y), edge(Y,Z) • its in-memory database queries are an order of magnitude faster than methods such as tuProlog. :- odbc_import(conn, 'tbl_treeinfo'(‘rootId', ‘author'), tree). :- odbc_import(conn, 'tbl_nodeinfo'('nodeId', 'nodename'), node). :- odbc_import(conn, 'tbl_edge'('parentId', 'childId'), edge). Integration Informatics Research Group
<tree author="stern"> <node type=“*"> <node type=“?"> <node> Stanhopea_gibbosa </node> <node> Stanhopea_vasquezii </node> </node> <node> Stanhopea_shuttleworthii </node> </node> </tree> node(Y1, ‘Stanhopea_shuttleworthii’), node(Y2, ‘Stanhopea_gibbosa’), node(Y3, ‘Stanhopea_vasquezii), edge(Y4,Y2), edge(Y4,Y3), lca(Y0,Y4,Y1), edge(Y0,Y1) Integration Informatics Research Group
Integration Informatics Research Group Integration Informatics Research Group
Summary • PhyQL offers a simple web-based visual query interface • Logic based tree query operations • Modifications to query tools only requires change in logic rules • Proposed architecture can also applied to protein-protein interaction networks, metabolic pathways etc. Future Work: • Database Interoperability – allow retrievingintegrate phylogenetic data during query submission • ReQuery – query on the result set • Tree Similarity Estimation
Thank You! me: http://homopan.wayne.edu/PhD Students/Munirul Islam/index.htm Integration Informatics Research Group
Uses of Phylogenetic Trees: • date events of divergence of species • what is the most common ancestor of all living species? • identify geographic origins of new disease outbreaks Integration Informatics Research Group
Crimson • Uses nested subtrees to avoid long strings • Zheng, Y. S. Fisher, S. Cohen, S. Guo, J. Kim, and S. B. Davidson. 2006. Crimson: A Data Management System to Support Evaluating Phylogenetic Tree Reconstruction Algorithms. 32nd International Conference on Very Large Data Bases, ACM, pp. 1231-1234.
0.1.1 0.1.2 0.2.1.1 0.2.1.2 0.2.2 A B C D E 0.1 0.2.1 0.2 0 Dewey system: Integration Informatics Research Group
A B C D E Find clade for: Z = (<CS+Ds) Find common pattern starting from left SELECT * FROM nodes WHERE (path LIKE “0.2.1%”); Integration Informatics Research Group
A B C D E 3 4 5 6 11 12 13 15 16 10 14 7 2 9 8 17 1 18 Depth-first traversal scoring each node with a left and right ID Integration Informatics Research Group
A B C D E 3 4 5 6 10 11 12 13 15 16 14 2 7 9 8 17 1 18 Minimum Spanning Clade of Node 5 SELECT * FROM nodes INNER JOIN nodes AS include ON (nodes.left_id BETWEEN include.left_id AND include.right_id) WHERE include.node_id = 5 ; Integration Informatics Research Group