1 / 14

Towards a semantic extraction of named entities

Towards a semantic extraction of named entities. Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK. Introduction. Challenges posed by progression from traditional IE to a more semantic representation of NEs

Sharon_Dale
Télécharger la présentation

Towards a semantic extraction of named entities

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK

  2. Introduction • Challenges posed by progression from traditional IE to a more semantic representation of NEs • What techniques are best for the deeper level of analysis necessary? • Can traditional rule-based methods cope with such a transition, or does the future lie solely with machine learning?

  3. The ACE program “A program to develop technology to extract and characterise meaning from human language” Aims: • produce structured information about entities, events and the relations that hold between them • promote design of more generic systems rather than those tuned to a very specific domain and text type (as with MUC)

  4. The ACE tasks • Identification of entities and classification into semantic types (Person, Organisation, Location, GPE, Facility) • Identification and coreference of all mentions of each entity in the text (name, pronominal, nominal) • Identification of relations holding between such entities

  5. <entity ID="ft-airlines-27-jul-2001-2" GENERIC="FALSE" entity_type = "ORGANIZATION"> <entity_mention ID="M003" TYPE = "NAME" string = "National Air Traffic Services"> </entity_mention> <entity_mention ID="M004" TYPE = "NAME" string = "NATS"> </entity_mention> <entity_mention ID="M005" TYPE = "PRO" string = "its"> </entity_mention> <entity_mention ID="M006" TYPE = "NAME" string = "Nats"> </entity_mention> </entity>

  6. The MACE System • Rule-based NE system developed within GATE, adapted from ANNIE • PRs: tokeniser, sentence splitter, POS tagger, gazetteer, semantic tagger, orthomatcher, pronominal and nominal coreferencer • Also: genre ID, switching controller to select different PRs automatically

  7. Differences between ANNIE and MACE • Locations  Location / GPE • GPEs have roles (GPE, Per, Org, Loc) • New type Facility (subsumes some Orgs) • Metonymy means context is necessary for disambiguation (e.g. England cricket team vs England country) • No Date, Time, Money, Percent, Address, Identifier

  8. What does this mean in practical terms? • Separation of specific from general information makes adaptation easier • Reclassification of gazetteers unnecessary • Changes mainly to semantic grammars to - use different gazetteer lookups • use more contextual information • group rules together differently

  9. Semantic Grammars • ANNIE uses 21 phases, 187 rules, 9 entity types (av. 20.8 rules per entity type) • MACE uses 15 phases, 180 rules, 5 entity types (av. 36 rules per entity type) • The important factor is the increased complexity of new rules, rather than the number • Rules may be hand-crafted, but an experienced JAPE user can write several rules per minute • 6 weeks for adaptation

  10. Evaluation (1)

  11. Evaluation (2) • NEWS – 92 articles (business news) • ACE – 86 broadcast news from September 2002 evaluation • Difference on ACE task • MACE on MUC-style annotations • GPEs are left as GPE (so count as errors) • GPEs are mapped to Locations

  12. Comparison of ANNIE vs MACE 72% Precision, 84% Recall if GPEs mapped to Locations

  13. Conclusions • MACE is a rule-based NE system, in contrast with most systems which use ML. • Advantages that doesn’t require much training data, and is fast to adapt because of its robust design • If large amounts of training data are available, HMM-based systems tend to perform slightly better • Rule-based systems tend to be good at recall but sometimes low on precision unless supported additionally by ML methods

More Related