1 / 5

edot expertise and interests of IASI

edot expertise and interests of IASI. General goal of edot. Integration of data from the web with existing data (about a given topic) Target topic: risk assesment in food existing databases managed by BIA/INRA a hot topic a lot of useful public data distributed over the web.

alva
Télécharger la présentation

edot expertise and interests of IASI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. edot expertise and interests of IASI

  2. General goal of edot • Integration of data from the web with existing data (about a given topic) • Target topic: risk assesment in food • existing databases managed by BIA/INRA • a hot topic • a lot of useful public data distributed over the web

  3. Overview of the programme • Specification of a data warehouse on risk assessment in food • Data acquisition from the Web • Structuring the warehouse • Validation

  4. Data acquisition from the Web • Related work (Djalil Mezaour’s Phd): • Focused crawling guided by a declarative specification of the needs • Design and evaluation of a form-based query language • Each item of the form is a keywords query specifying an aspect of the searched documents (title, anchor, text neighborhood of input or output links) • Experimented on three domains • good precision can be reached but not enough answers to populate the warehouse • Use of machine learning techniques to learn a crawling strategy to find more data

  5. Structuring the warehouse • Design of a global schema • pivot between existing databases and the specification of the searched web data • existing databases: fixed schemas • specification of the searched data: ?? • Keywords • Hierarchies of keywords (ontology) • URLs • Classification and integration of new web data Problem: bridging the structure gap

More Related