1 / 13

Kyoungryol Kim

Extracting Schedule Information from Korean Email. Kyoungryol Kim. Table of Contents. Purpose of Utilization Annotated Data Analysis Reference for NER Tagging Baseline System. 1. Purpose of Utilization. Purpose of Utilization.

doli
Télécharger la présentation

Kyoungryol Kim

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extracting Schedule Informationfrom Korean Email Kyoungryol Kim

  2. Table of Contents • Purpose of Utilization • Annotated Data Analysis • Reference for NER Tagging • Baseline System

  3. 1. Purpose of Utilization

  4. Purpose of Utilization • To extract accurate schedule information, including "Speaker", "Meeting Location" from Korean Email and register them to online calendar. • Finding semantics from extracted information. • Meeting Location : Geographical location recognition • Speaker : Person recognition (contacts of the email)

  5. ... 4 시 에 , 카이스트 전산동 1층 세미나실 에서 세미나 를 진행 합니다 ... 발표자 는 김 아나톨리 , 박광희 학생 ... 4 O 시 O 에 O , O 카이스트 B-Location 전산동 I-Location 1층 I-Location 세미나실 I-Location 에서 O 세미나 O 를 O 진행 O 합니다 O ... 발표자 O 는 O 김 B-Person 아나톨리 I-Person , O 박광희 B-Person 학생 O ... 4 O 시 O 에 O , O 카이스트 B-Location 전산동 I-Location 1층 I-Location 세미나실 I-Location 에서 O 세미나 O 를 O 진행 O 합니다 O ... 발표자 O 는 O 김 B-Person 아나톨리 I-Person , O 박광희 B-Person 학생 O 안녕하세요, 금주 수요일 오후 2시~4시에, 카이스트 전산동 1층 세미나실에서 세미나를 진행합니다. CI LAB과 TC LAB 이 공동으로 주관하는 세미나이며, 지도교수님께서 참석하실 예정입니다. 석사과정학생들은 꼭 참석바랍니다. 발표자는 김 아나톨리, 박광희 학생이니 준비해주십시오. 문의사항은 박상원 학생에게 문의바랍니다. 감사합니다. Geographical coordiates 35.1958694, 129.294384959595 isHeldAt hasReference hasReference 김아나톨리 박광희 INPUT TEXT Named Entity Recognition Information Type Classification Semantics Recognition OUTPUT Tokenization Template Generation

  6. 2. Annotated Data Analysis

  7. Annotated Data • Contents included in Word file.

  8. 3. Reference for NER Tagging

  9. Reference for NER tagging • [Lee et al. 2010] Named Entity Recognition with Structural SVMs and Pegasos algorithm • state-of-the-art Korean NER • Performance (F-measure): • CRFs (84.99%), structural SVMs (85.14%), modified Pegasos (85.43%) • Boundary tags : IBO2 model (B-I-O) • Domain of Corpus: • TV(2900:100 docs), Sports (3500: 100 docs) • Features : • Morpheme -2,-1,0,1,2 • Suffix -2,-1,0,1,2 • POStag -2,-1,0,1,2 • POStag + length • Position of Morpheme in Eojeol (Start /Center /End) • NE dictionary (true or false) + length • NE dictionary feature (index) + length • 15 regular expressions : [A-Z]*, [0-9]*, [0-9][0-9], [0-9][0-9][0-9][0-9], [A-Za-z0-0]*, ---.

  10. Reference for NER tagging • [Kim et al. 2008] Korean Named Entity Recognition Using Two-level Maximum Entropy Model • POS tagging • Noun-sequences extraction • NE boundary recognition • NE candidate selection (recognition) Boundary Tags :S : StartM : MiddleE : EndU : UnitermNONE

  11. Reference for NER tagging • [Seon et al. 2001] Korean Named Entity Recognition Using Machine Learning Methods and Pattern-Selection Rules • Select target words using POS-tag and clue word dictionary • Searches for target words in the NE dictionary • Handles unknown words using MEM methodwith lexical sub-pattern information and a clue word dictionary • Solves the ambiguity problem using NN. • Convert adjacent words into NE tag using pattern selection rules

  12. 4. Baseline System

  13. Baseline system • [Min et al 2005] Information Extraction Using Context and Position • Corpus : 245 meeting announcement email • Target : Attendee, Meeting Location, Time, Date • Performance (F-measure) : • Attendee : 36%, Meeting Location : 57%, Time : 92.5%, Date : 91% • Method • Sentence to LSP • NE Recognition • ME, NN, Pattern-selection • Instance Disambiguation • ML : Naive Bayes • Score calculation

More Related