110 likes | 129 Vues
iPlayr User Study Group. 2008.01.23 Daniel Wu Gordon Chang. Task assigned. Design user study to see different effect on Melody (M), Lyric (L), Melody + Lyrics (M+L) Design user study interface example: how to efficiently select 5 out of 70. User Study. Purpose
E N D
iPlayr User Study Group 2008.01.23 Daniel Wu Gordon Chang
Task assigned • Design user study to see different effect on • Melody (M), • Lyric (L), • Melody + Lyrics (M+L) • Design user study interface • example: how to efficiently select 5 out of 70
User Study • Purpose • 印證 [對使用者而言] 除了傳統的訊號處理,加上lyrics的資訊來判斷一首歌的情緒會更接近ground truth。 • Implication for iPlayr • Form the basis for adding lyrics information (semantics) into music recommendation system. • ???
User Study • Possible methods • Select a ground truth for each piece of music. • Compare M, L, and M+L performance with that ground truth.
User Study • Details • Select users (TBD) • Select songs to be tested (TBD) • Select features to be rated (TBD, probably only emotional features) • Select framework to rate feature (TBD, PA/PAD/Gordon-walking-pad /6-emotion/comparative*) • Select ground truth to be compared (TBD, see On Ground Truth slide) • Each user study consist of three sessions and a pre-session • Pre-session: introduce iPlayr and the experiment • M session: melody-only session, probably consist of 3 songs • L session: lyric-only session, consist of same songs with M session, only presented in different/random order. • M+L session: melody-and-lyric session, presented to the subject in different order. • In each session, user listens to the music or read the lyrics, and rate the selected features
On Ground Truth • Possible source of ground truth • CAL500 • User-dependent (use user’s his/her own M+L as his/her ground truth) • Comparative* (use Hotter or Notter method) • Why use CAL500 ground truth? • An established framework • A good benchmark to see the effect of our work
Hotter or Notter • http://hotter.csie.org/about/ • 消除絕對分數比較,每個使用者評分標準不同的偏誤(不需像Pandora那樣需要專家來給絕對分數) • Large-scale ranking by Sparse Paired Comparisons (avg. 3 votes for 1-object-1-feature) • Comparison pairs selected by computer
Possible Challenge / Questions • User Study Purpose / Impact • User study的目的是印證 [對使用者而言],歌詞對一首歌的角色,然而iPlayr作的是 [對機器而言]。是否可再確定User study的目的? • User Study Details • User study的 subject 要如何定義、尋找? • User study要挑多少首歌?怎麼挑?歌本身可能與跟結果dependent • All CAL500 • Clustered-pick • 每一首歌要放完整首,還是可以只放一小片段 • 要看David的結果,看30秒的片段是否有代表性,舉例:進退兩難 • 每次都是M+L放在最後?(都熟悉了當然最接近 ground truth) • Control group (單純聽M+L) • Ground Truth • CAL500的Ground truth是怎麼訂出來的? • 若用絕對給分,每個人的給分標準不同,可能造成偏誤 • Normalize • Hotter or Notter
Experiment • Testers: • 2 people, Daniel and Gordon • Scoring 18 emotions for each song rating from 1 to 5 • Music pieces • Selected from CAL 500 database by testers • 6 songs played randomly • Stopped when all testers finished tagging • Constraints • Not able to skim through previous answers • Not able to fill in in the first 15 seconds
Small difference • Effected by previous song? • Become more conservative