1 / 37

Semantics and Information Extraction

7/11/03. Semantics and Info. Extraction. What is Semantics?. Theory of the relationship between formal aspects of language and objects and facts in the world.. 7/11/03. Semantics and Info. Extraction. Traditional Approach in NLP (and linguistics). Define a well-behaved logical languageIntensional l

tyronica
Télécharger la présentation

Semantics and Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Semantics and Information Extraction Douglas E. Appelt Artificial Intelligence Center SRI International

    2. 7/11/03 Semantics and Info. Extraction What is Semantics? Theory of the relationship between formal aspects of language and objects and facts in the world.

    3. 7/11/03 Semantics and Info. Extraction Traditional Approach in NLP (and linguistics) Define a well-behaved logical language Intensional logic Dynamic predicate logic Discourse Representation Structures Define a semantics for the logical language (using model theory) Devise rules for translating natural language structures into the logical language that preserve truth conditions. Apply principles of compositionality to build larger structures from smaller ones.

    4. 7/11/03 Semantics and Info. Extraction Successes and Failures Success Data base query applications (e.g. ATIS systems) Dialog systems with narrow domain of application (e.g. TRAINS) Failures Extracting information from large corpora Real syntax too complex Coverage too weak for large corpora

    5. 7/11/03 Semantics and Info. Extraction Semantics and Information Extraction General requirements of a semantic theory for information extraction ACE as a specific approach to semantics for information extraction Examine specific issues Basic ontology Coreference Generic/Specific Metonymy Relations and Events

    6. 7/11/03 Semantics and Info. Extraction Information Extraction: A Pragmatic Approach Let application requirements drive semantic analysis Identify the types of entities that are relevant to a particular task Identify the range of facts that one is interested in for those entities Ignore everything else

    7. 7/11/03 Semantics and Info. Extraction MUC and Scenario Templates Define a set of interesting entities Persons, organizations, locations Define a complex scenario involving interesting events and relations over entities Example: management succession: persons, companies, positions, reasons for succession This collection of entities and relations is called a scenario template.

    8. 7/11/03 Semantics and Info. Extraction Problems with Scenario Template Encouraged development of highly domain specific ontologies, rule systems, heuristics, etc. Most of the effort expended on building a scenario template system was not directly applicable to a different scenario template.

    9. 7/11/03 Semantics and Info. Extraction Addressing the Problem Address a large number of smaller, more focused scenario templates (Event-99) Develop a more systematic ground-up approach to semantics by focusing on elementary entities, relations, and events (ACE)

    10. 7/11/03 Semantics and Info. Extraction The ACE Program Automated Content Extraction Develop core information extraction technology by focusing on extracting specific semantic entities and relations over a very wide range of texts. Corpora: Newswire and broadcast transcripts, but broad range of topics and genres. Third person reports Interviews Editorials Topics: foreign relations, significant events, human interest, sports, weather Discourage highly domain- and genre-dependent solutions

    11. 7/11/03 Semantics and Info. Extraction Components of a Semantic Model Entities - Individuals in the world that are mentioned in a text Simple entities: singular objects Collective entities: sets of objects of the same type where the set is explicitly mentioned in the text Attributes - Timeless unary properties of entities (e.g. Name) Temporal points and intervals Relations - Properties that hold of two entities over a time interval Events - A particular kind of relation among entities implying a change in relation state at the end of the time interval.

    12. 7/11/03 Semantics and Info. Extraction Semantic Analysis: Relating Language to the Model Linguistic Mention A particular linguistic phrase Denotes a particular entity, relation, or event A noun phrase, name, or possessive pronoun A verb, nominalization, compound nominal, or other linguistic construct relating other linguistic mentions Linguistic Entity Equivalence class of mentions with same meaning Coreferring noun phrases Relations and events derived from different mentions, but conveying the same meaning

    13. 7/11/03 Semantics and Info. Extraction Language and World Model

    14. 7/11/03 Semantics and Info. Extraction NLP Tasks in an Extraction System

    15. 7/11/03 Semantics and Info. Extraction The Basic Semantic Tasks of an IE System Recognition of linguistic entities Classification of linguistic entities into semantic types Identification of coreference equivalence classes of linguistic entities Identifying the actual individuals that are mentioned in an article Associating linguistic entities with predefined individuals (e.g. a database, or knowledge base) Forming equivalence classes of linguistic entities from different documents.

    16. 7/11/03 Semantics and Info. Extraction Choosing an Ontology for IE Semantics Ordinary native speakers should be able to annotate text with minimal training. People should have well-developed intuitions about type classification Is a museum an organization or facility? (A FOG?) People should have well-developed intuitions about entity coreference Peace in the Middle East Entities should be extensional, not abstract, generic, counterfactual, or fictional

    17. 7/11/03 Semantics and Info. Extraction The ACE Ontology and Annotation Standards Documents available online http://www.ldc.upenn.edu/Projects/ACE/ Entity standards Relations standards Proposed event standards still under development

    18. 7/11/03 Semantics and Info. Extraction The ACE Ontology Persons A natural kind, and hence self-evident Organizations Should have some persistent existence that transcends a mere set of individuals Locations Geographic places with no associated governments Facilities Objects from the domain of civil engineering Geopolitical Entities Geographic places with associated governments

    19. 7/11/03 Semantics and Info. Extraction Why GPEs An ontological problem: certain entities have attributes of physical objects in some contexts, organizations in some contexts, and collections of people in others Sometimes it is difficult to impossible to determine which aspect is intentded It appears that in some contexts, the same phrase plays different roles in different clauses

    20. 7/11/03 Semantics and Info. Extraction Aspects of GPEs Physical San Francisco has a mild climate Organization The United States is seeking a solution to the North Korean problem. Population France makes a lot of good wine.

    21. 7/11/03 Semantics and Info. Extraction Metonymy Metonymy is when a speaker uses a mention to refer in a systematic way to an entity with a different name or type than that mentioned. Metonymy is a property of mentions. A literal mention is where the mention uses the name or type of the referential entity. A metonymic mention violates that in some way. A single entity can have both literal and metonymic mentions.

    22. 7/11/03 Semantics and Info. Extraction Examples Name metonymy Beijing announced a new policy toward North Korea. Baltimore hit a home run in the ninth inning SRI was severely damaged in the 1989 earthquake Type metonymy John works for the restaurant on the corner

    23. 7/11/03 Semantics and Info. Extraction Problem Cases: literal and metonymic mentions both not types of interest

    24. 7/11/03 Semantics and Info. Extraction Role Ambiguity Why isnt it just metonymy? Iraq attacked Kuwait Was the attack on the physical territory? Was the attack on the government? Was the attack on the people of Kuwait? The answer is yes.

    25. 7/11/03 Semantics and Info. Extraction Multiple Roles Iraq disputed its border with Kuwait Governments dispute things Physical real estate has borders

    26. 7/11/03 Semantics and Info. Extraction Role Classification and Sparse Data Problem Role determination through predicate-argument constraints China announced a new policy regarding North Korea. ACE Corpus: About 20K words in training corpus GPE-PER: 84 configurations GPE-LOC: 432 configurations GPE-ORG: 504 configurations GPE-GPE: 789 configurations Only 131 configurations have more than 2 instances in the corpus (about 7%) Many of those involve weakly constrained predicates (have, be, of, etc.)

    27. 7/11/03 Semantics and Info. Extraction Generic vs Specific The assumed application is building a database using extracted information Databases typically represent concrete entities Specificity is a critical attribute of linguistic entities. Specificity is a property of the entity, not the mention: John is looking for a Java programmer. He must have three years of experience. Problem: assessment of specificity is a nuanced distinction subject to substantial inter-annotater disagreement

    28. 7/11/03 Semantics and Info. Extraction Types of Linguistic Mentions Name mentions The mention uses a proper name to refer to the entity Nominal mentions The mention is a noun phrase whose head is a common noun Pronominal mentions The mention is a headless noun phrase, or a noun phrase whose head is a pronoun, or a possessive pronoun

    29. 7/11/03 Semantics and Info. Extraction Entity and Mention Example

    30. 7/11/03 Semantics and Info. Extraction Relations Relations hold between two entities over a time interval. Relations may be timeless or temporal interval is not specified Relations have inertia, I.e. they dont change unless a relevant event happens.

    31. 7/11/03 Semantics and Info. Extraction Explicit and Implicit Relations Many relations are true in the world. Reasonable knoweldge bases used by extraction systems will include many of these relations. Semantic analysis requires focusing on certain ones that are directly motivated by the text. Example: Baltimore is in Maryland is in United States. Baltimore, MD Text mentions Baltimore and United States. Is there a relation between Baltimore and United States?

    32. 7/11/03 Semantics and Info. Extraction Another Example Prime Minister Tony Blair attempted to convince the British Parliament of the necessity of intervening in Iraq . Is there a role relation specifying Tony Blair as prime minister of Britain? A test: a relation is implicit in the text if the text provides convincing evidence that the relation actually holds.

    33. 7/11/03 Semantics and Info. Extraction Explicit Relations Explicit relations are expressed by certain surface linguistic forms Copular predication - Clinton was the president. Prepositional Phrase - The CEO of Microsoft Prenominal modification - The American envoy Possessive - Microsofts chief scientist SVO relations - Clinton arrived in Tel Aviv Nominalizations - Anans visit to Baghdad Apposition - Tony Blair, Britains prime minister

    34. 7/11/03 Semantics and Info. Extraction Types of ACE Relations ROLE - relates a person to an organization or a geopolitical entity Subtypes: member, owner, affiliate, client, citizen PART - generalized containment Subtypes: subsidiary, physical part-of, set membership AT - permanent and transient locations Subtypes: located, based-in, residence SOC - social relations among persons Subtypes: parent, sibling, spouse, grandparent, associate

    35. 7/11/03 Semantics and Info. Extraction Event Types (preliminary) Movement Travel, visit, move, arrive, depart Transfer Give, take, steal, buy, sell Creation/Discovery Birth, make, discover, learn, invent Destruction die, destroy, wound, kill, damage

    36. 7/11/03 Semantics and Info. Extraction Problem: Collective and Distributive Reference

    37. 7/11/03 Semantics and Info. Extraction Solution: Relations

    38. 7/11/03 Semantics and Info. Extraction Summary Motivation for a semantic theory is a practical one driven by database filling needs Pick a limited ontology of core concepts, and build out, motivated by application needs Address a broad spectrum of semantic problems, but from a limited ontology that simplifies data annotation issues.

More Related