170 likes | 388 Vues
Movie Review 数据标注. 主要内容. 介绍 已 有标注 新标注形式 标注工具. 简介. 目标: Review Summarization with Tagging 将 Review 中的特征词( feature )和情感词( opinion expression )标注出来,并标出特征词和情感词对应关系( feature-opinion pair). <Sentence>She is played by Olivia de Havilland in a great performance.
主要内容 • 介绍 • 已有标注 • 新标注形式 • 标注工具
简介 • 目标:Review Summarization with Tagging • 将Review中的特征词(feature)和情感词(opinion expression)标注出来,并标出特征词和情感词对应关系(feature-opinion pair) <Sentence>She is played by Olivia de Havilland in a great performance. <FO Fword=“performance" Ftype="PAC" Oword="great" Otype="PRO" /></Sentence> Sentence>Although the movie is long it is never boring.<FO Fword="movie" Ftype="OA" Oword="long" Otype="CON" /><FO Fword="movie" Ftype="OA" Oword="never boring" Otype="PRO" /></Sentence>
已有工作 • 庄丽Movie Review Summarization工作 • 人工标注获得feature • 人工标注获得opinion • 利用语义模板判断feature opinion pair
已有标注格式 <Sentence>I knew that it was a classic, not only the book, but the movie as well.<FOFword="book" Ftype="ST" Oword="classic" Otype="PRO" /><FO Fword="movie" Ftype="OA" Oword="classic" Otype="PRO" /></Sentence> It tells the story of Scarlett O'Hara, played the very beautiful Vivien Leigh. <FO Fword="PAC1" Ftype="PAC" Oword="beautiful" Otype="PRO" /> <Sentence>The music by Max Steiner, the great scenes such as Scarlett O'Hara's descent into chaos as she seeks a doctor during the battle for Atlanta, the ball, the party at Twelve Oaks, the burning of Atlanta fight scene, and others have never been surpassed.<FO Fword="music" Ftype="MS" Oword="never surpassed" Otype="CON" /><FO Fword="PMS0" Ftype="PMS" Oword="great" Otype="PRO" /><FO Fword="scenes" Ftype="VP" Oword="great" Otype="PRO" /></Sentence>
常用feature word • OA:movie love-film MOVNAMEanimationendingmoviesscenesfilms • ST: storyplotscriptdialoguestorylineendingscreenplaylinesscenesscenestorytellingideabooklinenarrativestoriesstory-tellingplotlinetaledrama • CH: charactersPEPNAMEcharactercharacterizationrolescharcters
常用feature word • VP:visuallycolorcoloursvisualsimagesanimationbattlefightingfight-sequencesimagepicturesshotsvisionbackgroundsbattle-scenescolorscolourfightfighting-scenesfightsflyingimagerytechnicolorbattle-scenefight-scenegraphicslightingsequencesequences • MS: musicscoresongsoundsongssoundtrack theme • SE: special-effectseffectsflashbacks
标注格式 • 专有名词用大写字母标出 • MOVNAME, 演员名字CH0等等 • 标注的Fword, Oword必须与句子中出现的完全一样 • 如果一个Feature对应多个opinion,则可以用逗号隔开 • <FO Fword="MOVNAME" Ftype="OA“ Oword="appreciated, most well-known" Otype="PRO" /> • 如果这个句子没有情感信息,或者没有与电影相关的特征词,这个句子不用标注。即没有出现Feature opinion pair。
新标注形式 • 在已标注的句子中,把所有feature和opinion都标出 • 之前主要考虑pair • 标注完整的情感表达,而不仅仅是情感词 • One of the greatest, • very beautiful, • most overrated • 修正部分标注错误 • 明显的opinion/feature错误 • 添加明显即有feature,又有opinion的未标注句子 The Movie is great. √ It is great. х The movie describe ABC. х
注意问题 • 句子中有一部分feature有情感描述,另一部分没有情感描述, 那么对没有情感描述的feature也要标出来,Opinion word 和type标注为NIL <Sentence>All that aside, whether you like the characters and settings or not, it's filled with good examples of how to tell a story on film.<FO Fword="story" Ftype="ST" Oword="good" Otype="PRO" /><FO Fword="film" Ftype="OA" Oword="good" Otype="PRO" /><FO Fword="characters, settings" Ftype="PAC" Oword=" " Otype=" " /></Sentence> <Sentence>But this is to prove that a good movie is not about special effects and computer animated beings, it's about a wonderful story and great characters.<FO Fword="movie" Ftype="OA" Oword="good" Otype="PRO" /><FO Fword="story" Ftype="ST" Oword="wonderful, great" Otype="PRO" /><FO Fword="characters" Ftype="PAC" Oword="wonderful, great" Otype="PRO" /><FO Fword="special effects, computer animated beings" Ftype="PAC" Oword=" " Otype=" " /></Sentence>
新标注形式 • 否定词的处理: • 把否定词与否定词一起修饰的词标在一块 <Sentence>The film is altogether one lavish masterpiece that will never be forgotten and is undoubtedly the greatest piece of cinema ever!<FO Fword="film" Ftype="OA" Oword="lavish, masterpiece, greatest, never forgotten" Otype="PRO" /></Sentence> <Sentence>Winner of an astounding, 10 Academy Awards, Gone With the Wind is truly a brilliant and lavish film that just can not be criticised to a low standard!<FO Fword="MOVNAME" Ftype="OA" Oword="truly, brilliant, lavish, not criticised" Otype="PRO" /></Sentence>
新标注形式 特征标注不用考虑代词,尽量将情感标注在feature上,而不是专有名词上 Sentence>Although the movie is long it is never boring.<FO Fword="movie" Ftype="OA" Oword="long" Otype="CON" /><FO Fword="movie" Ftype="OA" Oword="never boring" Otype="PRO" /></Sentence> <Sentence>"The Godfather" is a huge piece of film entertaining, involving sentiment, nostalgia, filial affection, pride, integrity, loyalty, corruption, honor, betrayal and crime.<FO Fword="MOVNAME" Ftype="OA" Oword="huge, entertaining, involving" Otype="PRO" /></Sentence • 对于performance: <Sentence>Vivien's amazing debut performance won her an Oscar.<FO Fword="performance" Ftype="PAC" Oword="amazing, won" Otype="PRO" /></Sentence> <Sentence>Scarlett then meets a man, Rhett Butler (Clark Gable, in a superlative performance) who falls in love with her.<FO Fword="PAC0" Ftype="PAC" Oword="superlative" Otype="PRO" /></Sentence> performance
标注工具 • 首先需要在程序运行目录建立set.txt文件 • 包含3行 • 第一行:数据存在的目录 • 目录不能包含中文,要以”\”结束 • 第二行:文档序号,从0开始 • 第三行:句子序号,从1开始