tOKo from TOKens to Ontologies

tOKofrom TOKens to Ontologies Anjo Anjewierden Human-Computer Studies laboratory University of Amsterdam http://staff.science.uva.nl/~anjo http://anjo.blogs.com/metis/

Overview tOKo for end-users (this presentation) • Help intelligent users develop ontologies from documents • Approach is to offer useful functionality (possibly smart) that applies to all kinds of documents • Demonstration: imagine you are an end-user who is given the task to develop an ontology for the cooking domain tOKo for researchers (second presentation) • Accessing tOKo using HTTP • Infrastructure • Information extraction and ontology-based search

Demonstration

Infrastructure (1) • Dictionaries (English, Dutch, German) • Used for word classes, inflections and spelling • Document representation (=corpus) • Low-level representation, highly indexed, fast access • Prolog primitives to access the corpus • corpus_pattern([word(Word), integer(Int)], Doc,From,To) • Searches for a Word immediately followed by an Int. For example: “room is A306” unifies Word with “A” and Int with “306”. Doc,From,To is unified with the document and document position.

Infrastructure (2) • Lots of higher level primitives (this one is • used in the HTTP demo. Note: little knowledge of Prolog required) word_frequencies_corpus(WFs, [ cases(alpha) , case(plain) , documents(all) , language(Language) , number_chars(2,infinite) , lemmatize(delete) ]).

Information Extraction • Phrases that may be concepts or attributes • 6 tbsp of sugar could be part of a recipe • 1089 WB could be an instance of the concept postal code • Such phrases don’t follow the rules of “language” • See demonstration for examples

Ontology-Based Corpus Searches • Query corpus with a combination of ontology constructs and language elements • Example: • [fruit] and [fruit] • Matches: “I bought some apples and pears” • Because [apple] is-a [fruit] (according to the ontology) and “apples” is the plural of “apples” (according to the dictionary)

Ontology-Based Text QL • Language constructs (provisional): • [concept] matches a concept (and sub-concepts) including inflections, synonyms, etc. in the corpus • (word) matches a word (incl. inflections, etc) • <word class> matches all members of the word class • @20 matches all (compound) terms that appear at least 20 times in the corpus • integer matches any integer • literal matches precisely that literal • Demonstration

Status • Usage • Ontology development (both research and contracted) • Document indexing (by Jan Jacobs and colleagues at Oce) • Finding inconsistencies in documents (has just started) • Research on top of tOKo (mostly using weblogs as a source, see my website for papers) • Caveats • Dictionary used is not “public” (CELEX) • Creating a corpus from an “arbitrary” set of documents may involve some programming (templates exist for HTML and plain text document sets)

Plan • Open Source? • Perhaps it is an idea to create an Open Source version • To do (for Open Source version) • Documentation (although lack of documentation has so far not been a problem for end-users) • Make infrastructure / external interfaces consistent • Some performance issues • Conclusion • Listen to users for good ideas!

tOKo from TOKens to Ontologies

tOKo from TOKens to Ontologies

Presentation Transcript

Tokens in C

Generating Application Ontologies from Reference Ontologies

Ontologies

Introduction to Ontologies

Ontologies

Mapping to Ontologies

Ontologies

Reference Ontologies, Application Ontologies, Terminology Ontologies

From Domain Ontologies to Modeling Ontologies to Executable Simulation Models

Introduction to Ontologies

Learning Ontologies from RDF Annotations

Introduction to Ontologies

Introduction to Ontologies

From thesauri to rich ontologies: The AGROVOC case

Introduction to Ontologies

SRM Space Tokens

Hard Times Tokens Part One – Political Tokens

Trustee Tokens

Mcap Tokens

From Domain Ontologies to Modeling Ontologies to Executable Simulation Models

Introduction to Ontologies

Reference Ontologies, Application Ontologies, Terminology Ontologies