100 likes | 213 Vues
This report outlines the progress on Semantic Role Labeling (SRL) using IOB2 format to identify argument structures. It discusses the CoNLL 2004 shared tasks, detailing the data set, its formatting, and annotated examples. The IOB2 format distinguishes chunk boundaries, classifying words as outside (O), beginning (B-$k), or inside (I-$k) a chunk. The report also explores potential argument definitions, classifiers, and features to utilize, particularly focusing on support vector machines. Finally, it analyzes metrics using the CoNLL 2004 dataset and sets hypotheses for improving system performance.
E N D
Progress report on SRL Abdul-LateefYussif 11-03-2011
Agenda • CoNLL 2004 Shared Tasks • Data Set • Format of Data Set
IOB2 Format • The IOB2 format represents chunks which do not overlap nor embed. • Words outside a chunk receive the tag O. • For words inside a chunk of type $k, the first word receives the “B-$k” tag (Begin), and the remaining words receive the tag “I-$k” (Inside).
Find potential Arguments • An argument can be any consecutive words • Restrict potential arguments • BEGIN(word) = word begins argument • END(word) = word ends argument • Argument • (wi…..wj) is a potential argument iff • BEGIN(wi) = 1 and END(wj) = 1
Classifiers & Features • I intend to use support vector with the following features • Words • Predicate lemmas • POS • Token Position • Path • Headword • length
Data and evaluation Metrics • CoNLL 2004 dataset • Part of the Propbank Corpus • Consists from the Wall Street Journal of the Penn Treebank • Training (Section 15-18) • Development (Section 20) • Testing data (Section 21)
Hypothesis • Target is to replicate and improved on Best System performance
Questions Thank you