220 likes | 311 Vues
Dynamic Building of Domain Specific Lexicons Using Emergent Semantics. Final Presentation. Matt Selway 100079967 Supervisor: Professor Markus Stumptner Knowledge and Software Engineering Laboratory School of Computer and Information Science. Contents. Motivations and Goals
E N D
Dynamic Building of Domain Specific Lexicons Using Emergent Semantics Final Presentation Matt Selway 100079967 Supervisor: Professor Markus Stumptner Knowledge and Software Engineering Laboratory School of Computer and Information Science
Contents • Motivations and Goals • Research Questions • Method • Experiments and Results • Summary and Conclusions • Limitations and Future Work
Motivations and Goals • Kleiner et al. (2009) developed a very different approach to Natural Language Processing (NLP) • Treat NLP as Model Transformation problem • Utilise Configuration as a model transformation • Model transformation is process of taking input models and creating output models from them • Foundation of Model Driven Engineering • Configuration is a constraint based searching technique • In this case the constraints are conformance to the desired meta model
Motivations and Goals • Overview of Process (Kleiner et al. 2009) • Method shows promising results • However, requires use of predefined lexicon
Motivations and Goals Issues for practical applications: • Can take a long time to manually build a complete lexicon, even for a Specific Domain • Predefined lexicon is static • Reduces level of automation
Motivations and Goals Short-range Goals: • At least partially automated creation of domain specific lexicons directly from the input text and external resources to retrieve lexical data • Make updates a natural part of the system • Allow sharing/reuse of lexical information Long-range Goals: • Improve the automated analysis of specifications • Support research into semantic interoperability • Develop global agreement on lexicons/ontologies
Research Questions • Can we reduce or eliminate the need to manually predefine a lexicon by dynamically building a lexicon based on the input text? • How much of a reduction can be gained? • How well does it work? (i.e. accuracy of retrieved data, how much data is automatically retrieved) • What are its limitations?
Method • Developed an experimental system • Attempted to use emergent semantics and semiotic dynamics in a similar way to that described by Steels and Hanappe (2006) for the interoperability of collective information systems. • They propose a multi-agent system that uses communication to arrive at an agreement on the meaning of the data, its tags, and its categories. • They take advantage of the semiotic triad between data, tags, and categories in user taxonomies (e.g. Bookmarks in a web browser) • Semiotic triad implies a meaningful relationship between its three components
Method Basic semiotic triad (Steels & Hanappe, 2006) • Similarly there exists a semiotic triad between a word, its use, and the domain it is used in. • Idea is that this triad can be used in dynamically developing domain specific lexicons between information agents.
Method (Design) • Multi-agent System • Lexical information retrieved from other agents • Initial data downloaded from online sources • User feedback adjusts the retrieved data • Agents update their lexicons and associations to lexicons based on user feedback (using semiotic relationship) • Lots of changes indicates the agents are actually using different domains • Few changes indicates updates to the lexicon in the same domain
Method (Online Sources) • Surveyed online lexicons/ontologies (CYC, WordNet, EDR) and dictionaries (Oxford, ‘The Free Dictionary’, ‘Your Dictionary’) • Excluded CYC, WordNet, EDR as not suitable • Turned to standard online dictionaries • Official dictionaries Oxford/Harvard not suitable (want money for access) • Discovered the ‘The Free Dictionary’ • Large number of entries • Enough detail in definitions (Transitive/Intransitive Verbs, Definite/Indefinite Articles, etc.) • Reasonably standard pages for parsing
Summary and Conclusions • It works! • How well? • High percentage of words had data retrieved, however, too much unnecessary data reduces the effectiveness • Accuracy is impacted by many factors • Incomplete/incorrect parsing of the web page • Small SBVR specification sample • SBVR keywords • Believe it is worth pursuing and improving • Fix parsing, use multiple sources • Define keyword lexicons, dynamically generate rest • Fill in gaps/cull using words with only one category • Etc.
Limitations and Future Work • Choice of dictionary • Potentially use multiple data sources • Joint words, i.e. most SBVR key words • Implementation not perfect • Parsing of the data source • No synonyms • Communication Protocol • Errors in adjusting association strengths • Strength adjustment values and threshold values used for lexicon classifiers need more research to find more appropriate values • Etc.
References • Kleiner, M, Albert, P & Bézivin, J 2009, ‘Configuring Models for (Controlled) Languages’, in Proceedings of the IJCAI–09 Workshop on Configuration (ConfWS–09), Pacadena, CA, USA, pp. 61-68. • Farlex 2010, The Free Dictionary, viewed 11 September 2010, <www.thefreedictionary.com>. • Steels, L & Hanappe, P 2006, ‘Interoperability Through Emergent Semantics A Semiotic Dynamics Approach’, in Journal on Data Semantics VI, vol. 4090, Springer Berlin / Heidelberg, pp. 143-167.