1 / 31

Kyoungryol Kim

Master’s Thesis. An Approach for Mapping of the Location Text in the Meeting Announcement to the Geographical Location. Kyoungryol Kim. Table of Contents. Introduction The Proposed Method Overall Architecture Ontological Location Model Module 1: Location Extraction Module 2: Geocoding

dysis
Télécharger la présentation

Kyoungryol Kim

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Master’s Thesis An Approach for Mapping of the Location Text in the Meeting Announcement to the Geographical Location Kyoungryol Kim

  2. Table of Contents • Introduction • The Proposed Method • Overall Architecture • Ontological Location Model • Module 1: Location Extraction • Module 2: Geocoding • Module 3: Disambiguation • Experimentation • Discussion

  3. Introduction

  4. Information Recognition on Smartphone • Smartphone devices begin supporting information recognition technologies focusing on date and time, phone numbers. • iPhone supports address recognition, but it recognize only completely formatted address, not all location text. May 21, 2011 MS Windows Phone RIM Blackberry Google Android Address Recognition Apple iPhone Phone No. Recognition Time(Text) Recognition Adding event by recognized time Location(Text) Recognition (Captured from Apple iPhone) People start to pay attention to ‘Location Extraction’ technique

  5. Information Extraction in Research Area • In research area, there have been trials setting up location model/ interoperating with other GIS, location extraction from the text, geocoding location text including address. But there’s no research have tried to make a system with all of these, synthetically. Geocoding Location Text Geocoding Address Text Location Extraction [Ben-Akiva 1998] [Bishr 1998] [Flury 2004] [Haklay 2010] 36.36393, 127.358722 36.36393, 127.358722 Interoperability 석사과정 학생 여러분, 이따 퇴근하고 오후 7시에 어은동GS25 앞에서 봅시다. [Freitag 1999] [Ciravegna 1999] [Soderland2000] GIS 1 어은동GS25 앞 어은동GS25 앞 어은동112-1 Location Model GIS 2 GIS [Pouliquen 2006] [Goldberg 2007] Coverage of This Research GIS n Location Model, Interoperating with other GIS Location Extraction from Text Geocoding Address Text Geocoding Location Text

  6. Goal of this Research • Mapping Meeting Location text to the Geographical Locationand update it to online calendar. Meeting Announcement Extract Meeting Location 무더운 날씨가 본격적으로 시작되는 즈음하여 유니브캐스트의 상반기 평가와 하반기 운영을 위한 정기팀장회의를 개최합니다. 날짜 : 7월 19일(토) 오후 2시 장소 : 명동 민들레영토 민들레영토오는길 지도와 같이 명동역 8번 출구로 나오셔서 쭉 상가 끼고 걸어가시면 저기YMCA빌딩 1층에 있습니다. Update Calendar Extract Time

  7. Problem Specification Target complexity of the problem : “1 Meeting Location” with “1 Event” * ‘Event’ is specifically clarified time or date for an event in the meeting announcement • Extracting the meeting location term from the meeting announcement 회의는 오후 5시 학생회관 101호에서 열립니다. (Meeting will be held 5 PM at Room 101, Student Union.) • Geocoding the extracted meeting location term. The geographical location for ‘학생회관 101호(Room 101, Student Union)’ is ambiguous, because almost every university have ‘학생회관(Student Union)’. • Event 에 대한 언급이 너무 짧게 넘어가는데 세부적으로 설명할것. (예를들어컨퍼런스 내에 많은 이벤트가 있는데, 이런경우도 분류되어서 지금 문제랑 비교) • Disambiguation

  8. Additional Problem : Location with Supplement • “Supplement” Location : • Locations which have different representation for each other, but points same location. • They are meaningful when all of supplements are merged together. 점프볼 클럽리그 대표자 회의는 3월18일 (목) 오후 8시에 점프볼 사무실에서 열립니다. A representative gathering for Jump-ball club league will be held 3. 18. (Thu) PM 8 at Jump-ball Office. ……… 점프볼 주소 : 서울시 송파구 가락동 49-4 화영빌딩3층 Jump-ball Address : 3rd floor, Hwayung B/D, 49-4, Garak-dong, Songpa-gu, Seoul supplement relationship merge 서울시 송파구 가락동 49-4 화영빌딩3층 점프볼 사무실

  9. Coverage of this Research • Toward 1,011 meeting announcements in the corpus • Coverage of this research : 88.23 % of documents (95.6% excepting no meeting location case) • 9p. 웹페이지에서 가져온 것들이기 때문에 그것도 언급해주야 하고, (source 에 대한 이야기가 빠져있음) 이게 상당히 formal 하다는 걸 말해야 한다. 우리연구의 코퍼스에 대한 이야기일뿐 전체 real-world 에 대한 이야기는 아닐수 있으니까 언급해야함. • coimplexity를 이야기할때p13.의 이야기가 함께 반영되어서 설명하도록 하자.

  10. The Proposed Method 1) Overall Architecture 2) Ontological Location Model 3) Location Extraction 4) Geocoding 5) Disambiguation

  11. Overall Architecture Training System Testing System Input Document Meeting Location Ontology & Instances PersonalInformation Location Extraction Extract Rules Corpus Expansion OpenAPI Map Services Geocoding Document Annotation Adding Document to Corpus Disambiguation Geocode DB Training Corpus Extract Instances Address Boundary DB OUTPUT

  12. The Proposed Method 1) Overall Architecture 2) Ontological Location Model 3) Location Extraction 4) Geocoding 5) Disambiguation

  13. Ontological Location Model • Necessity of Ontological Location Modeling • Granularity Variance (Niu and Kay 2008) • Although there are differences to NEs in the granularity level, many NER system doesn’t reflect the granularity. • To point the location on the map, it is necessary to recognize NEs with the specifically classified granularities. • Country > ... > City > ... > Building > ... > Room • Relation between Locations • To identify whether the locations in the document are same or not, the concept of relation should be reflected. • Embedded • ‘서강대학교 마테오관 1층 대강의실’ ‘서강대학교’ (Organization), ‘마테오관’ (Building), ‘1층’(Floor), ‘대강의실’(Room) • Equivalent • 화영빌딩은화정역3번출구1층에 드림오피스라는 문구점이 있는 건물입니다 • Supplement • 울산광역시 울주군 상북면등억리27번지.....먹고 쉬었다가(052-263-1206)

  14. Ontological Location Model • Advantages of Ontological Location Model • 여기서는 앞 페이지에서 설명한 것들이온톨로지컬 모델을 쓰면 반영이 되는지를 보여줄것 • 뒷페이지의온톨로지도 인과관계에 따라 보여주도록 할것 • p13 과 14가 인과관계가 없어보인다. 13에서 말하는 인과관계를 가지고 예를 재미있게 설명하도록. 기-승-전-결에서 지금 기-승으로 넘어가는 단계인거 같은데, 그게 아직 안되고 있다. p15 에 나오는 온토롤지가 인과관계 없이 그냥 나와있는데 그게 빠져있다. • 이쯤에서는 이론이 나오기 전에 example 이 3개정도 나와주면서, 온톨로지를 통해서 어떻게 된다는 것을 보여줘야 한다.

  15. Meeting Location Ontology

  16. The Proposed Method 1) Overall Architecture 2) Ontological Location Model 3) Location Extraction 4) Geocoding 5) Disambiguation

  17. Architecture of Location Extraction Rule-based Location Extraction (actual use and evaluation) Rule Generation by Ontology Input: - Meeting Announcement Text Training Corpus Lexical Analysis Extraction Rule Token Boundary Detection Uploading Announcement Email Meeting Location Ontology & Instances Token / Relation Extraction by Rule Token Type Matching Manual Annotation (Token / Relation) Syntax Analysis Extraction Rule Generation Annotation Database (Token, Relation) External Context Analysis External Relation Analysis Convert Token to Instance Internal Structure Analysis Output: - Token List - Relation List Manually Predefined NE Pattern & Rule

  18. NER System (1/2) : Lexical Analysis • Using extraction rules extracted from instances of ontology, detect boundary of tokens and match the type of the tokens at once, so that the system generate the symbol table. Input: - Meeting Announcement Text 타운미팅, 창업을 준비하는 청년과의 대화 개최 안내 알림 1. 관련: 중소기업은행-기업고객부(2009.2.19.) 2. 위와 관련하여 창업을 준비하는 청년과 대화의 장을 마련하고자 개최하는 타운미팅에 참석을 다음과 같이 알려드리니 많은 관심과 참가 신청을 바랍니다. 가. 개최목적 : 창업관련 애로사항에 대한 은행제도 개선 및 대정부 건의 나. 개최일시: 2009. 3. 5(목) 15:00 ~ 16:30(1.5H) 다. 개최장소 :서강대학교 마테오관1층 대강의실 라. 참석대상 : 창업동아리연합회 서울지역 재학생 및 졸업생, 창업에 관심있는 학생, 학교입주 창업기업 대표 등 150여명 마. 참 석 자 : 은행장, 기업고객담당 부행장, 중소기업청 창업진흥과장, (주)에이앤비소프트 대표 Lexical Analysis Extraction Rule Token Boundary Detection Symbol Table Meeting Location Ontology & Instances Token Type Matching External Context Table Syntax Analysis Extraction Rule Generation External Relation Table External Context Analysis Token Boundary Detection External Relation Analysis Internal Structure Table Token Type Matching Internal Structure Analysis Token List Output: - Token List - Relation List Manually Predefined NE Pattern & Rule

  19. NER System (2/2) : Syntax Analysis • Using extraction rules for external context, relation, internal structure and predefined NE patterns & rules, link tokens which are matched with those rules. Then system generate external context table, external relation table, internal structure table. Input: - Meeting Announcement Text Token List 타운미팅, 창업을 준비하는 청년과의 대화 개최 안내 알림 1. 관련: 중소기업은행-기업고객부(2009.2.19.) 2. 위와 관련하여 창업을 준비하는 청년과 대화의 장을 마련하고자 개최하는 타운미팅에 참석을 다음과 같이 알려드리니 많은 관심과 참가 신청을 바랍니다. 가. 개최목적 : 창업관련 애로사항에 대한 은행제도 개선 및 대정부 건의 나. 개최일시: 2009. 3. 5(목) 15:00 ~ 16:30(1.5H) 다. 개최장소 : 서강대학교 마테오관1층 대강의실 라. 참석대상 : 창업동아리연합회 서울지역 재학생 및 졸업생, 창업에 관심있는 학생, 학교입주 창업기업 대표 등 150여명 마. 참 석 자 : 은행장, 기업고객담당 부행장, 중소기업청 창업진흥과장, (주)에이앤비소프트 대표 Lexical Analysis Extraction Rule Token Boundary Detection Symbol Table Meeting Location Ontology & Instances Token Type Matching External Context Table Syntax Analysis Extraction Rule Generation External Relation Table External Context Analysis External Context Analysis External Relation Analysis External Relation Analysis Internal Structure Table Internal Structure Analysis Internal Structure Analysis Relation List Output: - Token List - Relation List Manually Predefined NE Pattern & Rule

  20. The Proposed Method 1) Overall Architecture 2) Ontological Location Model 3) Location Extraction 4) Geocoding 5) Disambiguation

  21. Geocoding Architecture

  22. Merge Equivalent Location 점프볼 클럽리그 대표자 회의는 3월18일 (목) 오후 8시에 점프볼 사무실에서 열립니다. A representative gathering for Jump-ball club league will be held 3. 18. (Thu) PM 8 at Jump-ball Office. ……… 점프볼 주소 : 서울시 송파구 가락동 49-4 화영빌딩3층 Jump-ball Address : 3rd floor, Hwayung B/D, 49-4, Garak-dong, Songpa-gu, Seoul supplement relationship 서울시 송파구 가락동 49-4 화영빌딩3층 점프볼 사무실

  23. Merge Equivalent Location

  24. Expansion of Administrative Address • 명동 유네스코회관 2층 미지센터

  25. Extract Address Information • From Organization name, if there is any address included, extract them. • 서울교육문화회관, 서울밀레니엄힐튼, 예산농업진흥센터

  26. Search Candidate Location Location : 서울 밀레니엄 힐튼 Query : - Millenium Hilton - Millenium - Hilton

  27. The Proposed Method 1) Ontological Location Model 2) Location NER 3) Geocoding 4) Disambiguation

  28. Disambiguation Title | Query | Address Original Query 동강밀레니엄래프팅밀레니엄 대한민국 강원도 영월군 영월읍거운리547-1 밀레니엄피시방서현점밀레니엄 대한민국 경기도 성남시 분당구 서현동 307 밀레니엄모텔 밀레니엄 대한민국 광주광역시 북구 오룡동1114-1 서울힐튼호텔밀레니엄 힐튼 서울 대한민국 서울특별시 중구 남대문로5가 395 밀레니엄 힐튼 서울 Input Document • Disambiguation • Number of Matched characters query-title, query-original query, query-address • (Can be used ) Semantic Type / Personal Annotation DB / Distance between locationLandmark • Personal Address book/Search history/GPS log Finding Target Locations Location NER OpenAPI Map Services Relation-type Classification Normalization 서울힐튼호텔: 대한민국 서울특별시 중구 남대문로5가 395 (36.3414225, 127.3914705) (Hotel) Disambiguation PersonalInformation Gazetteer OUTPUT Trained Models (CRFs,SVMs)

  29. Experimentation • Data Analysis • Experimental Result 1 : NER • Experimental Result 2 : Geocoding

  30. Discussion

  31. Limitations • TBD

More Related