1 / 13

Never-Ending Language Learning for Vietnamese

Never-Ending Language Learning for Vietnamese. Coupled SEAL. Student: Phạm Xuân Khoái Instructor: PhD Lê Hồng Phương. Main content. 1. Introduction.

gaerwn
Télécharger la présentation

Never-Ending Language Learning for Vietnamese

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Never-Ending Language Learning for Vietnamese Coupled SEAL Student: PhạmXuânKhoái Instructor:PhD Lê Hồng Phương

  2. Main content

  3. 1. Introduction • SEAL (Set Expander for Any Language) is a set expansions system that accepts input elements (seeds) of some target set S and automatically finds other probable elements of S in semi-structured documents such as web pages. • CSEAL (Coupled SEAL) is a SEAL systems which is added 2 constrants: • mutual-exclusion • type-checking constraints

  4. 1. Introduction Coupled SEAL : A semi-structured extractor SEAL: use wrapper induction algorithm Queries the internet with sets of beliefs from each category or relation; mines lists and tables for instances Uses mutual exclusion relationships to provide negative examples for filtering overly general lists and tables 5 queries/category 10 queries/relation fetches 50 web pages/query Rank by probabilities assigned as in CPL

  5. New candidate facts Beliefs CSEAL 1. Introduction Internet

  6. Knowledge Base Knowledge Integrator Data Resources Beliefs 1. Introduction Candidate facts 1 2 CSEAL CPL CMC RL 3 Subsystem Components

  7. Example

  8. 2. Concepts • Seed: input element • Wrapper: defined by 2 character strings, which specify the left-context and right-context necessary for an entity to be extracted from a page. These strings are chosen by 2 conditions: • Maximally-long contexts • At least 1 occurrence of every seed strings on a page

  9. Example

  10. 3. How it do

  11. 3. How it do

  12. 3. How it do

  13. References Toward an Architecture for Never-Ending Language Learning (http://www.cs.cmu.edu/~acarlson/papers/carlson-aaai10.pdf) Language-Independent Set Expansion of Named Entities using the Web (http://www.cs.cmu.edu/~wcohen/postscript/icdm-2007.pdf) Coupled Semi-Supervised Learning for Information Extraction (http://www.cs.cmu.edu/~rcwang/papers/wsdm-2010.pdf) Character-level Analysis of Semi-Structured Documents for Set Expansion (https://www.cs.cmu.edu/~rcwang/papers/emnlp-2009.pdf)

More Related