340 likes | 496 Vues
From General Ontology to Specific Ontology: Study of Shu-Shi Poems. Ru-Yng Chang, Sue-ming Chang, Feng-ju Luo*, Chu-Ren Huang Academia Sinica, Yuan-Ze University*. Knowledge and Knowledge Structure Variation. Knowledge is Structured Information
E N D
From General Ontology to Specific Ontology: Study of Shu-Shi Poems Ru-Yng Chang, Sue-ming Chang, Feng-ju Luo*, Chu-Ren Huang Academia Sinica, Yuan-Ze University*
Knowledge and Knowledge Structure Variation Knowledge is Structured Information • Most salient factors dictating variations in knowledge structures are time, space, and domain • Language is both the product and conduit of the conceptual structure of its speakers
Accessing Knowledge Structure • In order to become sharable and reusable knowledge, all extracted information must first be correctly situated in a knowledge structure • The situated information must be allowed to transfer from knowledge structure to knowledge structure without losing its meaningful content
Research Goal • Knowledge Structure Discovery • Knowledge as situated information • Language endows information with structure • Text-based and Lexicon-driven Knowledge Structure Discovery • General Ontology: the upper ontology shared by all domains (such as SUMO) • Specific Ontology: a ontology specific to a domain, historical period, an author etc.
Research Methodology • The Mental Lexicon Approach • The Shakespearean-garden Approach • The Ontology-merging as Ontology-discovery Approach
The Mental Lexicon Approach • Concepts are stored in the mental lexicon • The basic unit of mental lexicon organization and access is lexical entry • A complete list of lexical entries covers the complete list of conceptual atoms • Lexical semantic relations mirror conceptual relations Each Word is a Conceptual Atom
The Shakespearean-garden Approach • A Shakespearean garden collects all the plants referred to in Shakespearean texts. • The garden is used to illustrate the flora of the Shakespearean England and gives scholars a context in which to interpret his work. • There is a knowledge structure behind each corpus (i.e. a collection of texts with design criteria) Lexicon as a Structured Inventory of Conceptual Atoms • For instance, complete set of texts by an author, from a certain period, or in a certain domain
The Ontology-merging as Ontology-discovery Approach I • Ontology provides a structure for knowledge to be situated • However, there is a dilemma for the construction of a new ontology • If no existing ontology is referred to: reinventing the wheel, difficult to start a structure from scratch without rules • If existing ontology is referred to: mislead by existing structure, mismatched or erroneous
The Ontology-merging as Ontology-discovery Approach II The Solution • Map conceptual atoms to two (or more) reference ontologies • Merge the two resultant ontologies • Matched Mapping: Confirmation of knowledge structure • Mismatched Mapping: Only one or neither is correct. Possibly lead to discovery of new knowledge structure • Complimentary Mapping: Increases coverage
Resources used • WordNet http://www.cogsci.princeton.edu/~wn/ • SUMO Ontology http://www.ontologyportal.org • Academia Sinica Bilingual Ontological Wordnet (Sinica BOW) : SUMO + WordNet http://bow.sinica.edu.tw • Segmentation Program etc. http://LingAnchor.sinica.edu.tw/ • Domain Lexicon Management System: Segmentation, New Word Detection Lexical Database
The information of Sinica BOW EX: fish • Sense • Domain • POS • Definition • Translation • Semantic relation • SUMO • Example
SUMO: Suggested Upper Merged Ontology SUMO Atoms • Concepts: around 1000 Note that concepts are not necessarily linguistically realized • Relations(ISA): See SUMO Graph • Axioms: for inference • Open resource created under an initiative from IEEE Standard Upper Ontology Working Group
Methodology • From lexicon to ontology (from items to structure) • Ontology discovery through ontology merging
WHY? • We do not have the knowledge structure (ontology) of a new domain (historical period, field etc.) • But typical ontology discovery needs a framework to be mapped to • To solve the dilemma we map the conceptual atoms to both SUMO and WN (as a linguistic ontology)
How to build a domain ontology Word segmentation WordNet Match WordNet synsetand SUMO conceptautomatically SUMO Use WordNet information to check results and extend concept Transform into ontology browser format
Distribution of Shu Shi lexicon • 98,430 words in NO.1-45 volume
The distribution of animal, plant, and artifact concepts in Shu Shi’s poems
Concepts found in Shu Shi's but not in Tang 300 • aquatic mammal (whale 鯨*) • amphibian (frog 蛙*、toad 蟾蜍*、salamander鯢) • mollusk (clam蛤*、gastropod螺*、oyster蠔、snail蝸牛*、earthworm蚯蚓*) • crustacean (crab蟹*、shrimp蝦) Guangdong and Hainan Island
What We Learned about Specific Ontology Constructing ontology from a larger corpus and comparison of two specific ontologies • Local information can be effectively mapped • Global information offers deeper insights into the knowledge structure • Human conceptualization of animals and plants has been relatively stable. But NOT artifacts. • Regardless of the criteria for classification, genetically determined features (behaviors, appearances etc.) do not vary greatly • However, human technology is highly fluid. Our conceptualization of artifacts is highly dependent on the development of engineering and by our varying societal needs.
Axiom in SUMO (instance GeorgeBush Human) – GeorgeBush is an instance of the class of humans (exists (?X) (parent ?X GeorgeBush)) – there exists something of which George Bush is the parent (instance parent BinaryPredicate) – the relation of parent is a binary relation (domain parent 1 Organism) – the first argument to the parent relation must be an instance of the class Organism (domain parent 2 Organism) – similarly for the second argument
bird cuculiform_bird Cuckoo ani roadrunner coucal Centropus_sinensis pheasant_coucal shrub bush rhododendron azalea Example of WordNet lexical relation 杜鵑 DuJuan
SUMO WordNet bird organism cuculiform_bird plant animal Cuckoo invertebrate vertebrate ani roadrunner coucal Flowering plant Centropus_sinensis pheasant_coucal warm blooded vertebrate shrub bush mammal bird rhododendron azalea SUMO + WordNet 杜鵑 DuJuan
Summary and Future Work • Ontologies represent the knowledge structure of a domain or historical period • We have provided an online interface to browse ontologies and lexica • In the future, we will complete the online ontology editor and browser, which will • Map lexicon, WordNet and SUMO. • Integrate ontologies based on different texts. • Facilitate comparative studies of various domain ontologies.
Towards a Workbench for Specific Ontology: Browser and Editor User login Function menu (Personal ontologies list) Browse an ontology Edit an ontology Add an ontology Logout • SUMO • SUMO • + WordNet • +concept map with lexicon • Update lexical concepts • Update mapping between WordNet synset and lexicon • Edit other information in lexicon Import text Import lexicon Word segmentation Match concept and synset automatically • Suggestion list • Missing list
Constructing a Specific Ontology • Import text, or domain lexicon • Select style of writing • Select category of word list for word segmentation • Select reference ontologies to match SUMO and lexicon • Information of suggestion list • Candidate synset • Candidate synset synonyms • Explanation of candidate synset • Concept of candidate synset