Advancements in Case Frame Construction: Weekly Report from Semantic Web Research Center
This weekly report outlines ongoing projects and to-do lists at the Semantic Web Research Center as of February 17, 2010. It details recent works, including scope development for case frames and verb analysis in CoreNet. The report highlights progress in extracting usages from a large POS-tagged corpus and constructing models for automatic case frame extraction. Key issues such as the challenges of handling complex predicates and the time-consuming nature of manual construction are discussed. Future plans include organizing a database for usage and developing tools for efficient case frame construction.
Advancements in Case Frame Construction: Weekly Report from Semantic Web Research Center
E N D
Presentation Transcript
Weekly Report 2010. 2. 17 DuhyeonJin Semantic Web Research Center
Contents • Last issues • To-do-list • Works
Last Issues • The scope of words to give case frames • All verbs in CoreNet which don’t have case frames. • Predicate nominal (서술성명사 ) • Doing the experiment • Make a model to construct Case frames
To-do-list • Selecting word entries • Experiment • Extracting usages • Doing the construction • Selecting sample words • Assigning an appropriate word sense • Assigning an appropriate concept to arguments • Calculate time duration, and make model • Make the problem specification • Make instruction for case frame construction
Works: Selecting word entries • Selected 2,308 words Korean verbs 3,200 word senses from ‘현대 국어 사용 빈도 조사(2002), 국립 국어원’ Headwords2,014 Entries in CoreNet: 1,021 CorNet verbs(no case frame) 1,593 Predicate nominal 675 CorNet Adjectives(no case frame) 40
Works: Extracting Usages • Extracting POS-tagged sentences. • From ‘Sejong POS-tagged corpus’ (1,006,777 sentences, 969MB) • With selected words (2,308 words) • Using algorithm in Java, in local computer.
Works: Extracting Usages • Problems • Considerthe case of ‘predicate nominal’ + ‘되’ or ‘predicate nominal’+‘시키’ • Can we handle with 500 usages per a word? • Must reduce trivial usages or make a limit
Works: Doing the construction • Model 1 (experimented) • Manual construction • Using Text editor + Spread sheet + CoreNet Browser • Time duration: 25 usages in 30 min. • Extraordinary time consuming. • Model 2 (assumption) • Manual construction with tools • Database + tool + CoreNet lib. • Time duration(assumption): 180 usages in 30 min. • Suppose 100 usages per one word: • Model 3 • Automatic case frame extraction • Must survey articles or need help of someone
Works: Instructions for case frame construction • Making instructions on the web • http://sysx2.kaist.ac.kr/wiki/index.php/격틀구축지침 • Issues: • Modifing clauses: 개혁을강조한 사람 • '하다', '되다', '시키다'의 통사적 차이 예> '통과하다'의 경우 • 기업(NOM)이 심사(ACC)를 통과하다 • 기업(NOM)이 심사(DAT)에 통과되다 • 기업(ACC)을 심사(DAT)에 통과시키다.
Plan • To Finish organizing database for usages. • Making a tool for construction using database and CoreNet library