430 likes | 579 Vues
The Ontological Semantic Perspective on the Semantic Web. Victor Raskin, Purdue University & hakia.com Christian F. Hempelmann, hakia.com. Introduction. The Semantic Web is a good and obvious idea. Its successful implementation, however, depends on two major components,
E N D
The Ontological Semantic Perspective on the Semantic Web Victor Raskin, Purdue University & hakia.com Christian F. Hempelmann, hakia.com
Introduction • The Semantic Web is a good and obvious idea. • Its successful implementation, however, depends on two major components, • the adequacy of the formalism used to represent the content, and, most importantly, • the methods of rendering texts into that formalism. • The paper will focus on the general and current issues with both components.
Goal of Presentation • The Semantic Web, as conceived and especially as practiced, has not and cannot work • However, the purpose is not to knock it further--it is collapsing on its own • Good work done under its guise (cf. SDI) • In order to work, it should co-opt OntoSem • And then nobody will need the Semantic Web(stone soup)
Introduction • Bar Hillel: mathematical logicians favor the manipulation of the logical format to define rules of inference and similar issues over the adequacy of the format. • Semantic Web (and now, apparently, Google) relies on manual tagging of web pages with OWL or something like it by individual website owners: a Mao-like dream and a fatal error
Structure of Presentation 1 • Semantic Web as a Manifesto(cf. The Communist Manifesto—on second thoughts, don’t!) • Nature • Principles • Reasons • Formalism: • Form and Content (very very hard) • Linking is good--linking what? • Ontology, tags, and other luxuries
Structure of Presentation 2 • Translation of Text into OWL, RDF, etc. • Tag away! • Know thy meaning • Oh, we don’t know how to do it ourselves but it is really simple, and no, no professional skills required • Why not NLP • Sorry! Not NLP, MP • Fear of semantics • OntoSem is Semantic Web • Semantic Web is the stone of stone soup
What is the Semantic Web? • Making the content of the Web searchable, at least partially, on the basis of its semantic content, not simply on the basis of matching strings and metasyntactic tags. • Great vision! • So were alchemy and astrology--the devil is in the details!
Principles of the Semantic Web? • Generality:—Berners-Lee (1998a) describes this as follows: “When looking at a possible formulation of a universal Web of semantic assertions, the principle of minimalist design requires that it be based on a common model of great generality. Only when the common model is general can any prospective application be mapped onto the model”.
Principles of the Semantic Web? • Simplicity and low cost—according to Hendler (2001), “[a] crucial aspect of creating the semantic web is to make it possible for a number of different users to create machine-readable content without being logic experts. In fact, ideally, most of the users shouldn’t even need to know that web semantics exists. Lowering the cost of mark-up isn’t enough—for many users it needs to be free. That is, semantic mark-up should be a by-product of normal computer use. Much like current web content, a small number of tool creators and web ontology designers will have to know the details, but most users will not even know ontologies exist.” (Hold this thought!)
Reasons for the Semantic Web? • “traditional” artificial intelligence has not led to the development of realistic-scale practical applications; • the knowledge representation area, while generating useful ideas, has failed to translate them into a coherent, large-scale action; • prior work on world modeling and reconciliation among different formal models can be useful but still does not measure up to the standards of the emerging Semantic Web; • first-order predicate calculus (FOPC) and higher-order logic, the traditional reasoning techniques have been duly criticized for expressing things that were at the same time undecidable and not effectively computable, rigid.
Reasons for the Semantic Web? • Actually, NLP has failed also—because • it has been dominated by meaning-avoidance, street-lamp-based techniques • It has not attracted, prepared or encouraged qualified computational linguist participation • Computer scientists, engineers, and statisticians ignorantly confuse knowing a language with knowing about language--and are proud of it • But Sir Tim is blissfully unaware of that because it is not on his mental map • What you don’t know can help you (at least in attaining knighthood)
Formalism over Content • concept-x|concept-x(concept-y) • Happy? • Neat formalism, but the content is hidden: very important—for most semantic web developers this is content • http://youtube.com/watch?v=6gmP4nk0EOE nothing here is about meaning • Semantic Web is not! (Sir Tim apologized for the misnomer: Data Web)
Linking is Good! • Linking what? • Tagged character strings • Yes, it’s marginally better than linking character strings but not good enough: we still do not know what the labeled content is—it is still just character strings, and the maximum we can know is that some substrings recur, and how is that different from keywords • This is shallow semantics, aka no semantics • Semantic web is about “more labels than XML” • More non-semantics does not equal semantics!
Viva OWL! • Ontologies will save the day! (by carping diem?) • OWL contains ontologies (oops!) • OWL tells us how to build ontologies (oops!) • Okay, okay: OWL tells us how to formalize ontologies after we build them • “If I had ham I would make ham and eggs—if I had eggs, that is.” • So who teaches us how to build ontologies? • Oh, Sir Tim, of course! Listen to this:
Tim Almighty • There are a mixed feelings about the passion for tagging which typifies the Web 2.0 wave. On the one hand, there is excitement about the fact that users are, as a large number, adding re-usable information to the information space, allowing sites such as del.icio.us and flickr to sort, cluster and query masses of otherwise amorphous photos and web content. On the other hand, there is the sinking feeling that tags are headed the same way as keywords of Information Retrieval in the 1980s: initial hope, and then being stranded between the unbearable constraints of a controlled vocabulary and the hopeless ambiguity of uncontrolled user-generated keywords. Tom Gruber, writer of books on ontology who runs a Web 2.0 site himself. gave a talk at ISWC 2006 which touched on bringing the gap, and taking the passion to organize and express, and using it to make re-usable data.
Tim Almighty • There is currently a tension in the tagging world as to whether tags are regarded as global in meaning, or whether there meaning really depends on the tagger. In del.icio.us, one can query for thinks tagged with a certain word by a certain person. (I heard of one online community which was considering making a system to allow one formally to state when one has committed to use a given tag in the same way as another person, or growing mesh of people. That would be a very interesting feature, as it would allow a useful definition to gain growing acceptance, to progressively move from being a private idea to being a group global standard.)
Tim Almighty • Meanwhile, other sites get users to provide semantic web data with well-defined global ontologies. The locations of people, events and photos, relationships between people, authorship of publications, things and people an image depicts, and so on, is done using well-defined identifiers (under the covers) for everything involved, including the relationships and properties. The resulting data is extremely re-usable. The problem is that it isn't as quick as tagging with a single word off the top of one's head.
Ontology Building • So, now we know how to build ontologies? • Well… In fact, there are different evolved areas of ontology research: • Re-enlightened metaphysics = philosophical ontologies • Formal ontologies • Engineering ontologies • Computational ontologies • Controlled-vocabulary type ontologies • Additionally, there are: • Rules, formal and of thumb, for building ontologies • Methodologies • Acquisition toolboxes • Uniformity and continuity concerns (cf. CYC, R.I.P) • Some serious ontologists, incidentally, easily jumped on the Semantic Web bandwagon to get funding and have gotten off when the funding became scarce • Smart guys--just as their parents in SDI!
Trying to Build Ontologies from OWL Sites? • Turns out to be very simple: • Identify objects—that would be nouns • Identify attributes—that would be adjectives • Identify processes—that would be verbs? (not many go there) • Turns out to be very wrong as well: Syntax/Morphology do not correspond to meaning much as the non-semantic NLP of the 1980s discovered to its perish • John is easy to please vs. John is eager to please
Goals of Next Section • Present functional nouns as evidence of syntactic-semantic discrepancy • Introduce Ontological Semantics as a comprehensive machine-tractable representation of near-human understanding
Nouns in OntoSem • Total number of noun senses: 74,286 • Nouns as objects (80.51%) • Nouns as events (18.29%) • Nouns as properties (1.20%)
For Graph Lovers • Distribution of noun senses
noun as OBJECT bean (sem-struc(BEAN)) BEAN is-a LEGUME is-a VEGETABLE-FOODSTUFF is-a PLANT-FOODSTUFF is-a FOODSTUFF is-a FOOD is-a INGESTIBLE is-a INANIMATE is-a PHYSICAL-OBJECT is-a OBJECT 59,806 noun senses
noun as PROPERTY viscosity (sem-struc(VISCOSITY)) VISCOSITY is-a PHYSICAL-PROPERTY is-a PHYSICAL-OBJECT-ATTRIBUTE is-a LITERAL-OBJECT-ATTRIBUTE is-a LITERAL-ATTRIBUTE is-a ATTRIBUTE is-a PROPERTY 893 noun senses
noun as EVENT tempest (sem-struc(THUNDERSTORM(intensity(value >0.7)))) THUNDERSTORM is-a STORM is-a NATURAL-HAZARD is-a DISASTER-EVENT is-a PHYSICAL-EVENT is-a EVENT 13,587 noun senses
BANKRUPTCY is-a financial-event agent owe.agent owe.beneficiary precondition approach-bankruptcy has-event-as-part (IF modality.pay.value = 0 THEN bankrupt-chapter-7 ELSE bankrupt-chapter-11) APPROACH-BANKRUPTCY is-a financial-event agent corporation-a has-event-as-part ... nouns are complex EVENTS ... (IF AND owe agent corporation-a beneficiary human-a employed-by corporation-a lending-institution-a corporation-b theme money pay agent corporation-a beneficiary human-a lending-institution-a corporation-b theme money THEN bankruptcy agent corporation-a beneficiary human-a lending-institution-a corporation-b)
User does the Work:Aim • complex task • cheap (free) labor • enthusiastic? • unstrained • lightly supervised • coerced
User does the Tagging: Case Studies • CYC • originally largely unsupervised and unsalvageable output • waning interest [when curiosity fails, why do it?] • digg.com • large number of unsophisticated users vs. noteworthy, relevant web content
User does the Work: Case Studies • Mao • peasants vs. blast furnaces • Volkssturm • modern warfare vs. untrained mass armies • Linguists asking the “native speaker” about meaning of their language • knowing a language vs. knowing about language
Users cannot Identify Meaning • What is the meaning ofJohn and Mary are husband and wife? • They met • They liked each other • They dated • They got engaged • They got married • They live together
Users cannot Identify Meaning • What is the meaning ofJohn and Mary are husband and wife? • They have sex • They live together • They may have children • They have joint accounts • They socialize together • All of the above
Users will not Tag Reliably, Easily, Uniformly, or Happily • Users • can determine all the material that needs to be tagged; • are familiar with the tag inventory and understand what the tags mean; • can determine the appropriate tag or tags for each element that must be tagged; and • can perform consistently over time and with other taggers. • Not!!!
Users will not Tag Simply <?xml version="1.0" ?> <!DOCTYPE rdf:RDF (View Source for full doctype...)> - <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/"> - <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/title"> <rdfs:label xml:lang="en-US">Title</rdfs:label> <rdfs:comment xml:lang="en-US">A name given to the resource.</rdfs:comment> <dc:description xml:lang="en-US">Typically, a Title will be a name by which the resource is formally known.</dc:description> <rdfs:isDefinedBy rdf:resource="http://purl.org/dc/elements/1.1/" /> <dcterms:issued>1999-07-02</dcterms:issued> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/contributor"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/creator"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/publisher"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/subject"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/description"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/date"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/type"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/format"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/identifier"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/language"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/relation"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/source"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/coverage"> </rdf:Property> + <rdf:Property rdf:about="http://purl.org/dc/elements/1.1/rights"> </rdf:Property> </rdf:RDF>
User Will Not Tag Simply • The prospective tagger must: • first find this page • locate the tag in question • understand the semantics of the fillers of the “comment” and “description” properties • and then learn to assign the tag “title” to the appropriate elements of any web page that he or she is writing. • You must be kidding!
Users won’t do It • Want simplicity, generality, uniformity, low cost, and ease? • Sure, automate! • Go where you can find it—not where the street light is and you can continue to use your favorite methods: playing with yourself… oops, sorry, with formalisms • Go to meaning processing system—like OntoSem
OntoSem Resources • the 6,724-concept ontology, • a 47,025-entry English lexicon with 77,156 senses, • a 19,352-entry onomasticon and a total of 24,328 senses, • a text meaning representation (TMR) language, • an ontological parser transforming text into TMRs, and • a fact repository, containing the growing number of implemented TMRs.
OntoSem Resources(for Graph Lovers) text or data OntoSem resources ontology lexicon OntoParser full TMR
Ontology Top Level All (= empty root concept) Objects Events Properties
Ontology Event Branch Events Mental events Social events Physical events ALL Objects Events Properties
Properties Case roles Agent Theme Beneficiary Instrument Purpose Location Source Destination Path Ontology Property Branch,Case Role Subbranch
Concept and Lexicon Entry Ontological Concept go is-a motion-event agent animal instrument body-part, vehicle source location destination location start-time temporal-unit end-time temporal unit Lexical Entry drive-V1 [all but semantic information omitted] sem-struc go agent human & adult instrument car
Mary drove from Boston to New York on Wednesday GO agent Mary instrument car source Boston destination New York start-time Wednesday end-time Wednesday OntoSem is Event-Biased
Ignorance of linguistics, for which the linguists are also responsible Fear of semantics, for which the linguists are also responsible Bad history of NLP Objective difficulties of studying meaning (= mind) So let us do the easy and pleasant stuff! Forget this talk even happened and carry on with your fun and games Thanks—and apologies! Why That was not in Sir Tim’s Vision?