Integration of Biological Data (LifeDB)

Integration of Biological Data (LifeDB) Presented By Md. Shazzad Hosain (shazzad@wayne.edu) Supervised By Dr. Hasan Jamil (jamil@cs.wayne.edu) Wayne State University, Detroit, USA

Outline • Data Integration • WebFusion (our previous work) • LifeDB (our goal) • Research Scopes

Data Integration Example • Detroit to Bologna air ticket • Alitalia, Italy Airline • Air France • NorthWest Airline • Lufthansa etc.

CheapAir.com / Expedia.com Alitalia Lufthansa Air France Delta myAirFare.com CheapAir.com Expedia.com …… Integration Example cont.

Integration Approaches • Warehouse Integration • Mediator based Integration • Navigational Integration

Warehouse Integration • Materialize data from all sources to local warehouse • Emphasize data translation rather query translation • Advantages: Low network bottleneck, efficient • Disadvantages: reliability in terms of most up to date data, system maintenance

Mediator – based Integration • Concentrates on Query translation • GAV approach and LAV Approach

Mediator Schema S1 S2 S3 S4 GAV Approach • Query reformulation easy, but addition or removal of sources are difficult • Preferred when sources are known an stable

LAV Approach • Query reformulation is difficult but addition or removal of source are easy • Appropriate for large scale ad-hoc integration • ARIADNE, Discovery Link, TAMBIS, KIND etc Mediator Schema S1 S2 S3 S4

Navigational Integration • Some sources provide information that would not/hardly be accessible without point-and-click navigation

WebFusion Dr. Liangyou Chen

DBGET LinkDB KEGG Pathways • Can these be done electronically for a biologist?

Go to: http://www.ncbi.nlm.nih.gov/LocusLink/

Click <Register Web Process> menu

1. Input: 103730 2. Press <Pickup Input> button

2. Press [Go] button 1. Press <Next> button

1. Mark the table 2. Press <Pickup Table> Button

Press the <Create> Button

Uncheck all • Boxes except 2~6 2. Press the <Update & Redraw> Button

1. Give it a name called: LocusLink 2. Name them as: Link, LocusID, Org, Symbol, Description respectively 3. Select appropriate transformations 4. Press <Update & Redraw> button

Press <Confirm & Create Table>

LocusLink web process is created

DBGET LinkDB KEGG Pathways

1. Select ‘LocusLink’ table 2. Type in ‘LocusLinkQuery’ as a query name 3. Check these fields to display 4. Double click here

1. Select ‘local_gene_ids’ table 2. Select ‘LID’ field 3. Click here (any place)

Click <This Query> button

Press <Execute> button

Here shows in progress results

LifeDB

DBGET LinkDB KEGG Pathways

LifeDB • Resource Discovery • Automatic Schema/Ontology Matching • Query Optimization • WorkFlows • BioFlow (A declarative WorkFlow Language)

Glimpse of BioFlow DNA sequence repositories GeneBankURL FlyBaseURL GeneBank format EMBL format Combine these sequence University of Minnesota Reading Frame Predictor (input_seq : FASTA format, species) Score and predicted DNA region

BioFlow • workflow open_reading_frame; • useontology BioSystems ; • declare found logical, count int; • define data sequences_1 at GeneBankURL as (seq_1 DNA) ; • define tool orf at URL parameter (seq DNA, target organism) results (score int, predicted_region DNA) ; • combine sequences_1, sequences_2 into sequences (seqs); • select seqs, orf (seqs, “drosophila”) from sequences ; Goal is to develop a formal BioFlow language syntax with compositionality, closure property and type safety

Research Scope • Resource Discovery • Automatic Schema/Ontology Matching • Query Optimization • WorkFlows • 7-8 PhD positions • 3-5 years funding

Thanks to all

Integration of Biological Data (LifeDB)

Integration of Biological Data (LifeDB)

Presentation Transcript

The Integration of Biological Data Using Semantic Web Technologies

Network analysis of biological data

Biological Data Integration

Towards Seamless Integration and Querying of Biological Data

Data Integration

Secure of historical biological data Assembly of current biological data

FlyWeb: the way to go for biological data integration

Integration of attribute data

STRING Modeling of biological systems through cross-species data integration

Integration of Biological XML data

The Integration of Biological Data Using Semantic Web Technologies Susie Stephens

BioGrid: Integration of Biological Data Grid and Computing Grid

Data Integration and Extraction over Molecular Biological Data

Methods of collecting biological data:

Biological Interpretation of Microarray Data

Visualization, Animation, and Integration of Biological Complexes

An Advanced Strategy for Integration of Biological Measurement Data

Biological Information Integration Toolkit

Visualization, Animation, and Integration of Biological Complexes

STRING Modeling of biological systems through cross-species data integration

Importance Of Data Integration