Querying the deep Web
170 likes | 338 Vues
Querying the deep Web. By John Muntunemuine and Martha Kamkuemah Supervisor: Sonia Berman. Outline. Problem being tackled Why its important Related Work Overview of the system Scope Design challenges Main components of project Key success factors Risks Conclusion.
Querying the deep Web
E N D
Presentation Transcript
Querying the deep Web By John Muntunemuine and Martha Kamkuemah Supervisor: Sonia Berman
Outline • Problem being tackled • Why its important • Related Work • Overview of the system • Scope • Design challenges • Main components of project • Key success factors • Risks • Conclusion
Problem being tackled • Querying databases hidden behind query interfaces and retrieve results from them • Build a query based system able to send a query to multiple deep Web databases simultaneously • Then investigate a generic solution
Why its important? • Not many tools out there to query the deep Web • Can create Internet services such as “comparison shopping” by integrating data from competing service providers. • Less effort to query single interface
Related work • S Raghavan & H Garcia-Molina: HiWE (Hidden Web Exposer) that automatically parses, processes & interacts with form-based search interfaces
Related work • Article: Web data management Integration of query interfaces Query processing Result processing
Related work • Wu: Mismatch problem with one-to-one mapping Developed a clustered-based schema integration technique that maps fields in query forms
Scope • Heterogeneous nature of data stored in hidden databases • Solution: look at airline domain • Start specific • Generalize solution
Design challenges Locating the relevant sections Semantically matching attributes
Main components of project • Two main parts • Query formulation • Result interpretation
Success factors • If we can send a query to one deep Web database and display results • Then expand system for more general solution by implementing heuristic to deal with general pages
Risks • If the implementation of both parts of the system takes time, we might concentrate on one side of the system – query formulation
Conclusion Query hidden databases Investigate ways to make the system more general Subsystems Query formulation Result interpretation