1 / 9

Semantic Role Labeling with support vector machines

Semantic Role Labeling with support vector machines. Yongjia Wang. An Intuitive Example. What the data looks like. General Ideas of SVM SRL. Model free classification Off-Line machine learning for information retrieval.

Télécharger la présentation

Semantic Role Labeling with support vector machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic Role Labeling with support vector machines Yongjia Wang

  2. An Intuitive Example

  3. What the data looks like

  4. General Ideas of SVM SRL • Model free classification • Off-Line machine learning for information retrieval. • Using linguistic information readily available from many standard tools: parsers, chunkers … • Still need additional semantics related linguistic knowledge to generate final prediction. • Doesn’t come for free of course. • Need manually/semi-automatically labeled training/testing data. • Need other resources to compile the training data: WordNet, VerbNet … to provide pre-defined frames. • Go one step beyond syntactic structure • Still about shallow semantics • Also called semantic parsing • Types • Constituent-by-constituent (syntactic constituent): I took this approach • Relation-by-relation (dependency relation) • Word-by-word (finer grained) • Hybrid: Combinations of multiple variants within the same type or across multiple types, the final results are selected from alternatives with different ‘confidence’. Need global optimization. There are examples but no standards for this.

  5. SRL General Procedure • Training Data Pruning • Get rid of parsing errors • Get unbiased training data – positive/negative examples for binary classifiers • Argument Identification • Binary classifier. • Can be tuned independent of classification • Argument Classification • For n classes, train n binary classifieres instead of single n-class classifier • Each class can be trained and tuned independently • Reduce amount of data required • Finer grained information for post-processing • Post-Processing • Resolve conflicts by knowledge, as the previous classifications are purely local • Global optimization , can be formalized more mathematically. • Evaluation • Accuracy & Recall

  6. The Project I Did • Pre-processing • Parsing and other file processing • Naïve data pruning, pick enough positive and negative data for each label classifier. • Argument identification with libSVM (Ignored) • Simple binary classification • Argument classification with libSVM (the main part about SVM) • Local feature based classification using libSVM • Compared tradeoff between performance and information gain • Post-processing (Simplified) • Just take the classifier with highest probability adjusted by the label’s background probability. • No Conflict resolution and global optimization thereafter.

  7. Issues Huge Feature space • Prediction feature representation • Option1: color {red, green, blue} as {0,1,2} • Option2: as {(1,0,0), (0,1,0), (0,0,1)} • Categorial features don’t have contingent relationships (e.g. red is closer to green than to blue?), but just the way it is encoded. The numerical information will be misused if a single numerical value is used. It results in loss of information intuitively because of being overwhelmed by randomness. • Bit vector encoding makes all values orthogonal, but on the other hand increases feature space a lot • Feature selection • Cannot do that gradually. e.g. 3127 verbs. Have to decide whether to take 3127 more features or not.

  8. Issues Data Prunning • Data Error • Parser error • Labeling error • Performance Issues • Previous studies showed that good data pruning improve performance • Computational issues • Cannot afford to train each classifier using all data • Pick subset of data containing enough positive and negative examples

  9. Rooms for Further Improvement Feature Reduction • Grouping feature values • Grouping verbs with similar semantics. • Verb clustering is an separate issue has been studied. • Factorize features • For feature like ‘Path’, it’s possible to factorize it, rather than treating every instance as orthogonal values

More Related