1 / 24

Automatic Extraction of Hierarchical Relations from Text

Automatic Extraction of Hierarchical Relations from Text. Authors: Ting Wang, Yaoyong Li, Kalina Bontcheva , Hamish Cunningham, Ji Wang Presented by: Khalifeh Al- Jadda. Outlines. Introduction Motivation Contribution Experiment and Results Conclusion Discussion points.

amadis
Télécharger la présentation

Automatic Extraction of Hierarchical Relations from Text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Extraction of Hierarchical Relations from Text Authors: Ting Wang, Yaoyong Li, KalinaBontcheva, Hamish Cunningham, Ji Wang Presented by: Khalifeh Al-Jadda

  2. Outlines • Introduction • Motivation • Contribution • Experiment and Results • Conclusion • Discussion points

  3. Introduction • What is Information Extraction (IE)? • is a process which takes unseen texts as input and produces fixed-format, unambiguous data as output. It involves processing text to identify selected information, such as particular named entity or relations among them from text documents.

  4. Introduction • Most researches have focused on use of IE for populating ontologies with concept instances. • Examples: • Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM Semi-automatic CREAtion of Metadata, 2002. • Motta, E., VargasVera, M., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup, 2002.

  5. Motivation • An Ontology-based application can’t be adapted to work with different domains. • Some Machine Learning (ML) techniques were used to overcome the problem this problem. • ML techniques: • Hidden Markov Models (HMM). • Conditional Random Fields (CRF). • Maximum Entropy Models (MEM). • Support Vector Machine (SVM)--- The best

  6. Contribution • The paper propose a new technique by applying SVM with new features to discover a relation between entities and then determine the type of that relation. • This technique can be applied to any domain. • The Information Extraction system that used as a base to the proposed technique was Automatic Content Extraction (ACE).

  7. The Automatic Content Extraction (ACE) • Is a relational extraction program that uses Relation Detection and Characterization (RDC) according to a predefined entity type system. • ACE2004 introduced a Type and Subtype hierarchy for both entity and relations. • Entities are categorized in a two level hierarchy, consisting of 7 types and 44 subtypes.

  8. ACE2004

  9. ACE2004

  10. Why SVM? • Even though it is a binary classifier but it can be easily extended to be multi-class classifier by using simple techniques like one-against-all or one-against-one. • It is scalable which means it can work with large scale and complex data set. • It start with a huge number of features but then it ignores and eliminate unnecessary features.

  11. Features for relation extraction • The researchers have used General Architecture for Text Engineering (GATE) for feature extraction. • Let’s take this example of a sentence to show different type of features: Atlanta has many cars

  12. Cont.. • Word Features: • 14 features include: • Entity mention (Atlanta,cars) • The two heads (two words before entity and two after) • Word list between two entities • POS Tag Features: part-of-speech tagging • Atlanta/NNP has/VBZ many/JJ cars/NNS • NNP: proper name • JJ: adjective • NNS: plural noun

  13. Cont.. • Entity Features: ACE2004 classify each entity into it’s proper Type, subtype, and class. • Atlanta is GPE • Mention Features: includes • Mention type (AtlantaNAM, CarsNOM) • Role information (only for GPE) • Overlap Features: concern on the position of entities • The number of words separating them. • Number of other entity mentions in between. • Whether one mention contains the other.

  14. Cont.. • Chunk Features: GATE integrate two chunk parsers: • Noun phrase chunker (NP) (Atlanta,Cars). • Verb phrase chunker (VP) (has). • Dependency Features: determine the dependency relationships between the words of a sentence. • Parse Tree Features: the features on syntactic level are extracted from the parse tree. BuChart parser used in this research. Atlanta

  15. Cont.. • Semantic Features from SQLF: Buchart provides semantic analysis to produce SQLF for each phrasal constituent. • Semantic features from WordNet: • Synset-id list of the two entity mentions. • Synset-id of the heads (two words before and words after)

  16. Experiment Results • To assess the accuracy of classification these measures are used: • Precision • Recall • F-measure

  17. Data Set

  18. Results on different kernel

  19. Result on different features

  20. Result on different classification levels

  21. Conclusion • This research investigated SVM-based classification for relation extraction and explored a diverse set of NLP features. • The research introduces some new features including: • POS tag, entity subtype, entity mention role..etc • The experiments show an important contribute to performance improvements

  22. Any Question?

  23. Discussion points • Is this technique convenience to automate ontology building? • Are you with or against using huge number of features (in our case 94) to represent a relation? • How many people see that this is an applicable and useful technique for relation extraction? • Why yes and why No? 

  24. Thank You

More Related