1 / 20

Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain

Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain. Andrés García-Silva † , Leyla Jael García-Castro ± , Alexander García*, Oscar Corcho † † {hgarcia, ocorcho}@fi.upm.es Ontology Engineering Group Universidad Politécnica de Madrid , Spain

ingo
Télécharger la présentation

Social Tags and Linked Data for Ontology Development: A Case Study in the Financial Domain

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Social Tags and Linked Data for Ontology Development:A Case Study in the Financial Domain Andrés García-Silva†, Leyla Jael García-Castro±, Alexander García*, Oscar Corcho† †{hgarcia, ocorcho}@fi.upm.es Ontology Engineering Group Universidad Politécnica de Madrid, Spain ±leylajael@gmail.com Universitat Jaume I, Castellón de la Plana, Spain *alexgarciac@gmail.com State University, Florida, USA June 2014 FPI grant BES-2008-007622

  2. Introduction • Web 2.0 • User-generated Content • Social Networks • Tools for organizing, sharing & discovering Information Folksonomies Tagging Systems Knowledge Base Programming language Java Persistent Access Tutorial Java Java Database Programming language Java Folksonomy Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  3. Folksonomies Folksonomies as a source of knowledge Introduction • Vocabulary emerges around resources and usersGolder and Huberman (2006), Marlow et al. (2006) • Maintained by a large user community • Flexible (No restricted) • Up-to-date • Emergent semantics from the aggregation of individual classifications Gruber (2007), Mika (2007), Specia and Motta (2007) Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  4. State of the art Folksonomies Statistical-based Ontology-based Two tags are related if.. Tag Similarity Measures relation? Cattuto et al. (2008) Markines et al. (2009) Körner et al. (2010) Benz et al. (2011) Ontology Folksonomy Heymann and Garcia-Molina. (2006) Begelman et al. (2006) Hamasaki et al. (2007) Jäschke et al. (2008) Kennedy et al. (2007) Mika (2007) Benz et al. (2010) Limpens et al. (2010) Angeletou et al. (2008) Cantador et al. (2008) García-Silva et al. (2009) Maala et al. (2008) Passant (2007) Tesconi et al. (2008)) Ontology Generation Ontology Hybridapproaches Ontology Giannakidou et al. (2008) Specia and Motta (2007). Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  5. State of the art Folksonomies Limitations • Statistical-based • Most of the approaches do not distinguish between classes and instances • Relation semantics is limited to some types and is not precesily defined • No domain knowledge • Ontology-based • All the approaches produce either enrichments or instances (No Classes) • Relations are not identified • No domain knowledge • Hybrid • Semi-automatic ontology generation • No domain knowledge Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  6. Proposal Folksonomy Goal: Generate a domain baseline ontology, containing classes and relationships, out of folksonomy information. Terminology Extraction Domain relevant resources (URL) Domain Experts List of domain terms drive the extraction of domain classes and relationships from LOD Semantic Elicitation Linked Open Data* *“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/” Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  7. We propose a process to extract domain knowledge from large and generic knowledge bases which is driven by the domain terminology in the folksonomy • It may save time in the ontology development process • It allows ontology engineers to understand the domain with a limited participation of domain experts. • Smaller and more focused ontologies which are potentially easier to understand and maintain. • complex queries and reasoning task may execute faster on smaller data sets • In observance of methodological practice, our technique harvests community knowledge and reuses existing ontologies • The Ontology has links to external classes and relationships available in the Linked Open Data cloud. Benefits Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  8. Challenges Problem: Tags lack semantics • Ambiguity • Synonyms • Acronyms • Morphological variations • Plurals • Singulars • Verb Conjugations • Misspellings Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  9. FolksonomyA = U x T x R, G = (V,E) where V = U ∪T ∪ R, and E ={(u, t, r)|(u, t, r) ∈ A} Resource graph G’ = (V’,E’) where V’ = R, and E’={(ri, rj)|∃((u, tm,ri)∈A ^ (u, tn, rj)∈A ^ tm= tn)} Spreading Activaction Seeds: Domain relevant resources from Domain Experts Nodes weighted with an activation value used to start the search. Activation value spreads to adjacent nodes by an activation function. Activation function: ~Shared tags between the visited node and the source node, and the source node activation value. Activation function > threshold: Node marked as activated and the spreading continuous to adjacent nodes. Tags of activated nodes are collected as domain terms. Approach Terminology Extraction Goal: To extract domain terminology from the folksonomy Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  10. Approach Semantic Elicitation Goal: To relate domain terms (tags) to DBpedia resources • Normalize the tag to the standard notation of DBpedia resource titles • Search for aresource with a label equal to the normalized tag using SPARQL • If not exists: Use an spelling suggestion service and search again • If exists: Check if it is related to a disambiguation resource • If true: retrieve disambiguation candidates • Select the most similar candidate to the tag context • Vector space model • Candidate Resources represented using their textual descriptions • Tag represented using its context (i.e, cooccurrent tags) • Selection of most similar candidate using Cosine • If false: Select the resource (Default sense in Wikipedia) Enabling folksonomies for knowledge extraction: A semantic grounding approach (2012)A García-Silva, I Cantador, Ó CorchoInternational JournalonSemantic Web and InformationSystems 8 (3), 24-41 Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  11. Approach Use ask constructor to verify if the entity is a class If not: • Create queries to traverse all the possible paths of equivalent relations between the entity and a class in the RDF graph Semantic Elicitation Goal: Identify classes from resources # Query 1. ASK{<resource> <rdf:type> <rdfs:Class>} # Query 2 SELECT ?class WHERE{ <resource> ?rel1 ?class. ?class <rdf:type> <rdfs:Class> FILTER(?rel1 = <owl:sameAs>)} # Query 3 SELECT ?class WHERE{ <resource> ?rel1 ?node. ?node ?rel2 ?class. ?class <rdf:type> <rdfs:Class> FILTER((?rel1 = <owl:sameAs>) && (?rel2 = <owl:sameAs>))} RelFinder: RevealingRelationships in RDF Knowledge Bases. PhilippHeim, SebastianHellmann, JensLehmann, SteffenLohmann and Timo Stegemann In: Proceedings of the 4th International ConferenceonSemantic and Digital Media Technologies (SAMT 2009), pages 182-187. Springer, Berlin/Heidelberg, 2009.   Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  12. Approach For each pair of classes • Create queries to traverse all the possible paths between two classes in the RDF graph, and retrieve the relationships. SemanticElicitation • Goal: To identify relations between classes • Caveats • May result in adding non relevant domain information to the ontology • Large path • Path passes through abstract concepts or relationships • cyc:ObjectType • umbel:RefConcept Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  13. Approach Keep the path length short • Our experiments show satisfactory results with short path lengths that allow us to enrich the initial set of classes while preserving the precision of the ontology Avoid high level concepts • Create lists of high level concepts collected from the knowledge base vocabularies to filter out the paths containing those concepts • Knowledge base core vocabularies are usually well documented • http://umbel.org/specications/vocabulary • http://mappings.dbpedia.org/server/ontology/classes/ • http://www.cyc.com/kb/thing • Use semantic similarity distances • Wu and Palmer, 1994 : Depth of the classes and the common subsumer in the taxonomy • Jiang and Conrath, 1997: subclasses per class, class depth, information content, etc. SemanticElicitation Minimizing the risk to add non relevant information to the ontology Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  14. Evaluation Experiment in thefinancialDomain Finance vocabulary Input Evaluation Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  15. Evaluation Experiment in thefinancialDomain Terminology Extraction Finance vocabulary Finance Ontology Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  16. Ran the process with an activation threshold 0.8 The ontology produced consists of 187 classes, 378 relations of 8 different types, and 12 modules. Evaluation Inspecting a financialontology Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  17. Evaluation Inspecting a financialontology Ontology Modules Evaluation A Class Precision = 80.67%, Relation Precision=96.4% Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  18. We have generated a method for automatically developing domain ontologies • Limited user participation • We benefit from the aggregation of the individual classifications to extract an emergent domain vocabulary • In accordance with methodological guidelines we reuse existing knowledge (The Web of Data) • We tap into existing links between data sets to collect related semantic information • We avoid, to some extent, semantic mismatches • We avoid heterogeneous representations In practice, we expect the method will be used by ontology engineers to generate baseline ontologies that can be refined later according to the ontology requirements. Conclusions Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  19. Develop a method to assess automatically the validity of the relationships found in the linked data cloud: • OpenCycStock Exchange is owl:sameAs UMBEL Exchange of User Rights • However: • Stock Exchange is an organization • Exchange of User Rights is an event The use of semantic similarity measures to decide whether to include or not relationships found setting up a path between two classes. To be able to discover and use datasets in the linked data cloud that cover the domain of interest. Future Work Social Tags and Linked Data for Ontology Development: A case study in the Financial Domain

  20. Social Tags and Linked Data for Ontology Development:A Case Study in the Financial Domain Andrés García-Silva†, LeylaJaelGarcía-Castro±, Alexander García*, Oscar Corcho† †{hgarcia, ocorcho}@fi.upm.es OntologyEngineeringGroup Universidad Politécnica de Madrid, Spain ±leylajael@gmail.com UniversitatJaume I, Castellón de la Plana, Spain *alexgarciac@gmail.com State University, Florida, USA June 2014 FPI grant BES-2008-007622

More Related