1 / 20

Using Term-matching Algorithms for the Annotation of Geo-services

Semantic Web Service Interoperability for Geospatial Decision Making (FP6-026514). Using Term-matching Algorithms for the Annotation of Geo-services. Miha Gr ča r 1 , Eva Klien 2 1 Jo ž ef Stefan Institute, Slovenia 2 Institute for Geoinformatics, Germany. Introduction and motivation.

kylia
Télécharger la présentation

Using Term-matching Algorithms for the Annotation of Geo-services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Web Service Interoperability forGeospatial Decision Making (FP6-026514) Using Term-matching Algorithms for the Annotation of Geo-services Miha Grčar1, Eva Klien2 1Jožef Stefan Institute, Slovenia 2Institute for Geoinformatics, Germany

  2. Introduction and motivation • Geo-data • Provided by geo-services • Information about geographical features such as rivers, lakes, roads, quarries, geological structure… • Geo-services • Web-based services • Defined by Open GIS Consortium (OGC) • Web Feature Services (WFS) • Spatial filtering • Common interface (syntactically…) • HTTP/XML-based • Semantic incompatibility (interoperability issue) • Synonymy (e.g. “Aegirite” and “Acmite” is the same mineral) • Data structured differently • Multiliguality (e.g. “river” and “fleuve” is the same thing) • European project SWING – Semantic Web Service Interoperability for Geospatial Decision Making • STREP in the 6th Framework Programme • http://www.swing-project.org/ This is what weare trying to solve

  3. Outline of the talk • Geo-service annotation • Automating the annotation • Text mining • Web as the source of documents • Evaluation • Preliminary evaluation • Larger-scale evaluation • Conclusions and future work

  4. Domain ontology Web Feature Axiomatized concept definitions Service that capture a specific view on the world Represent Geo-service annotation Facilitates discovery and composition How to establish this “bridge”? Real world entities Spatial information objects

  5. Geo-service annotation Domain ontology WFS

  6. Automating the annotation • Term matching is the main building block • Using text mining techniques for term matching • Bag-of-words representation of documents, document similarity • Clustering and classification • Visualization techniques • Using the Web as the source of documents for text mining • Search engines • On-line encyclopedias • Dictionaries, thesauruses…

  7. Similarity? Similarity? Where do we getthesedocuments? Bag-of-words space Automating the annotation Geo-service Domain ontology Schema open-pit mine D:Quarry D:Legislation Similarity? Classifier

  8. One possible source of the documents Context Search term Documents

  9. Preliminary evaluation • Dataset: 150 mineral names together with their synonyms • Train a classifier to distinguish between mineral names

  10. The Web Diopside Preliminary evaluation • Dataset: 150 mineral names together with their synonyms • Train a classifier to distinguish between mineral names Aegirite Alalite Allanite Classifier Synonym Diopside Diopside … … Zincblende Zinc-spinel Zinc vitriol

  11. The Web Diopside Diopside Diopside Preliminary evaluation • Dataset: 150 mineral names together with their synonyms • Train a classifier to distinguish between mineral names Aegirite Alalite Allanite Sort andrecommendto the user Classifier … … Zincblende Zinc-spinel Zinc vitriol

  12. Preliminary evaluation Sort order

  13. Larger-scale evaluation • Datasets • STINET Thesaurus (STINET = Scientific and Technical Information Network) • 16,000 terms interlinked with broader-than, narrower-than, used-in-combination-for, used-alone-for… (2 more) • We took 1,000 term-pairs for each of the narrower-than and used-alone-for relations • GEMET (General Multilingual Environmental Thesaurus) • 6,000 terms interlinked withbroader-than and related-to • We took 1,000 term-pairs for each of the two relations • Tourism ontology • 710 concepts interlinked with is-a • A set of instances (mostly named entities) belonging to the concepts • We took 1,000 named entities and their corresponding concepts, and the entire structure defined by the is-a relation • WordNet (lexical database for the English language) • 115,000 synsets (i.e. sets of synonymous words) interlinked with hypernymy, meronymy, entailment, cause for verbs… (6 more) • We took 1,000 word-pairs for each of 9 selected relations • We also considered the inverted relations for 3 selected relations (e.g. consists-ofis inverse ofpart-of)

  14. Larger-scale evaluation • Examples • GEMET • traffic infrastructure broader-than road network • mineral resource related-to mineral deposit • STINET • numerical methods and procedures used-alone-for gauss-seidel method • potassium narrower-than alkali metals • Tourism ontology • gliding field is-a sports institution • Warsaw instance-of city • WordNet • do drugs causes trip out • snore entails sleep • modify hypernym-of Europeanize • Cretaceous period instance-of geological period • shuffling meronym-of card game • rum meronym-of rum cocktail • housewife synonym-for homemaker

  15. Larger-scale evaluation • Experimental setting • Classification algorithm • k-NN • Centroid classifier • Quotes • Yes – exact occurrence • No – co-occurrence • We ran experiments on 18 datasets, 4 different settings on each dataset; this means roughly 4 x 18,000 term-pairs altogether • We measured accuracy on top 1, 3, 5, 10, 20, 40 “recommended” items

  16. Sort order GEMETrelated-to WordNetsynonymy WordNetentailment,hypernymy WordNetpart meronymy WordNetcause forverbs WordNetsubstancemeronymy WordNetmember meronymy STINET used-alone-for Larger-scale evaluation Synonymy… Meronymy… Verbs…

  17. WordNethypernymy STINETnarrower-than Tourism ontology is-aGEMET broader-than WordNet Tourismontology Larger-scale evaluation Hyper-/Hyponymy… Class membership (instance-of)...

  18. Conclusions • Term’s lexical category (e.g. verb vs. noun) has the largest impact on the accuracy • The dataset has [much] larger impact on the accuracy than the choice of the classifier • General vs. specific vocabulary (works better for specific vocabulary or named entities) • Semantics of the relation (works best for synonymy) • The centroid classifier faster and [slightly] more accurate • Quotes useful on datasets that contain [technical] expressions (e.g. STINET) • Inverting the relation has no major impact on the results

  19. Future work • Try SVM • “Cleanup” the document sets • Active learning • Clustering, removing irrelevant clusters • Both techniques require interaction with the user • Visualize the “term space” • Latent Semantic Analysis (LSA), Multi-Dimensional Scaling (MDS) • Force-directed layout • Use WordNet to infer relations between arbitrary words • Input: two words • Process: detect the corresponding synsets and explore inter-relations • Output: most probable relations (according to WordNet) • Deal with the multilinguality issue • Kernel Canonical Correlation Analysis (KCCA) • Machine translation

  20. Thank you... • ...for your attention • Any questions?

More Related