60 likes | 150 Vues
Explore the challenges of integrating and querying distributed biological data sources, aiming to improve information retrieval effectiveness. Discussing current limitations and proposing potential solutions for seamless data integration. Presenting modes of information integration and ongoing work on matching XML schema and Java objects.
E N D
Towards Seamless Integration and Querying of Biological Data Estella T. Pham – Master’s Student in CS, UML Dr. Kajal Claypool – Professor, UML
Topics of Discussion • The BIG problem • A quick background information • Our long-term goal • My current work
The BIG problem • Distributed, heterogeneous data sources. • Database systems ( DBMSs, semantic heterogenity ) • Operating systems ( files ) • Hardware • How to obtain most of the relevant information on one particular subject effectively when the pieces of the information are in different databases ? For example, find protein A structure, its folding properties and propensities, amino acid sequence, DNA sequence, organization and expression? • Why are the current data integration tools inadequate?
A Quick Background Information • 3 modes of information integration • Federated databases ( n databases ) • Warehousing ( n databases, a warehouse ) • mediation ( n databases, n wrappers, a mediator )
My Current Work • “U00096.gbk” and “ecoli.txt” ( GenBank and Swiss-Prot ) XML ‘Schema Java objects Schema matching