170 likes | 210 Vues
Foraging for Music. Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008. What’s the Problem?. How much music is there? Music holdings of Library of Congress: over 10M items
E N D
Foraging forMusic Donald Byrd School of Informatics & Jacobs School of Music, Indiana University rev. 10 April 2008
What’s the Problem? • How much music is there? • Music holdings of Library of Congress: over 10M items • Most is notation, especially CWMN (Conventional Western Music Notation), not audio • Includes over 6M pieces of sheet music, 10’s/100’s of thousands of scores of operas, symphonies, etc. • Today • iTunes: 6M tracks • P2P: 15B tracks • Tomorrow • “All music will be on line” • People have very diverse tastes, etc. 8 April 08
Classification: Logician General’s Warning • Classification is dangerous to your understanding • Almost everything in the real world is messy, ill-defined • Absolute correlations between characteristics are rare • Example: Ginger Baker says Cream wasn’t a rock group • Example: did Bach write piano music? • People say “an X has characteristics A, B, C…” • Usually mean “an X has A, & usually B, C…” • Leads to: • People who know better claiming absolute correlations • “Is it this or that or that?” questions that don’t have an answer • Don changing his mind • But lack of classification is dangerous to understanding! • Should we abandon (hierarchic) classifications? • Of course not! they're too useful, & impossible to avoid • Just be on guard for misleading things, consider alternatives 8 Apr. 08
Basic Representations of Music & Audio Audio (e.g., CD, MP3): like speech Time-stamped Events (e.g., MIDI file): like unformatted text Music Notation: like text with complex formatting 27 Jan. 06
Basic and Specific Representations vs. Encodings Basic and Specific Representations (above the line) Encodings (below the line) rev. 15 Feb.
A Similarity Scale for Content-Based Music IR • Categories describe how similar to query the items to be found are expected to be (from closest to most distant) • Detailed audio characteristics in common 1. Same music, arrangement, performance venue, session, performance, & recording 2. … 4. Same music, arrangement, performance venue; different session, performance, recording • No detailed audio characteristics in common 6. Same music, different arrangement; or different but closely-related music, e.g., conservative variations (Mozart, etc.), many covers, minor revisions 7. Different & less closely-related music: freer variations (Brahms, much jazz, etc.), wilder covers, extensive revisions 8. Music in same genre, style, etc. 9. Music influenced by other music 27 Mar. 07
Ways of Finding Music (1) • How can you find information/music you’re interested in? • You know some of it • You know something about it • “Someone else” knows something about your interests • => Content, Metadata, and “Collaboration” • Metadata • “Data about data”: information about a thing, not thing itself (or part) • Includes the standard library idea bibliographic information, plus information about structure of the content • Metadata is the traditional library way • Also basis for iTunes, etc.: iTunes Music Library.xml • iTunes, Winamp, etc., use ID3 tags in MP3’s • Content (as in content-based retrieval) • Cf. tasks in Music Similarity Scale • Collaborative • “People who bought this also bought…” 6 Mar. 06
Ways of Finding Music (2) • Do you just want to find the music now, or do you want to put in a “standing order”? • => Searching and Filtering • Searching: data stays the same; information need changes • Filtering: information need stays the same; data changes • Closely related to recommender systems • Sometimes called “routing” • Collaborative approach to identifying music makes sense for filtering, but not for searching(?) 8 Mar. 06
Ways of Finding Music (3) • Most combinations make sense & seem useful 6 Mar. 08
Searching: Metadata (the old and new way) vs. Content (in the middle) • To librarians, “searching” means of metadata • Has been around as long as library catalogs (c. 300 B.C.?) • To IR experts, it means of content • Only since advent of IR: started with experiments in 1950’s • Ordinary people don’t distinguish • Expert estimate: 50% of real-life information needs involve both • The two approaches are slowly coming together • Metadata creating “games” (Listen Game, etc.) should help a lot • Need ways to manage both together 22 March 07
To the Rescue: Music Recommenders! (1) • Music Recommendation Tutorial • by Paul Lamere & Òscar Celma, at ISMIR 2007 • Introduction: Why music recommendation is important • 4-5: the Long Tail -- 6-10: different types of uses • 20 Formalization of the recommendation problem • 26-31: users & items -- 64-80: genre & other text tags • 105 Recommendation algorithms • 135 Problems with recommenders • 136-155: social recommenders -- 156-157: content-based • 158 Recommender examples • 159ff: social -- 168ff: content (Pandora) -- 180ff: hybrid • 184 Evaluation of recommenders • 188ff: metrics -- 191-192: mainstream vs. eclectic users • 246 Conclusions / Future 8 Apr. 08
To the Rescue: Music Recommenders! (2) • Tim Westergren’s approach: Pandora • “Music Genome Project” defined 400 “genes” (attributes) • Every piece (song) has value 1 thru 10 assigned for each • ...completely manual: done by experts w/ degrees in music theory, etc. • Mostly content-based • Has major advantages, but hybrid (social & content) is probably best 8 Apr. 08
“I don’t want similar music, I want something completely different!” (1) • Much research, many commercial ventures designed to help people find music similar to something they have • …but what about people who want something very different? • May not be that unusual: cf. Celma & Lamere “mainstream vs. eclectic users” slides • E.g., something as far as possible from Britney Spears • Don has “Seriously Weird” playlist & “Music as Different as Possible” project • How about Brian Whitman’s “Eigenmusic” approach? • Problem: parameters too low-level, not perceptually significant! rev. 10 April 08
“I don’t want similar music, I want something completely different!” (2) • How practical to make a system do depends on its representation of music • Must represent perceptual features well enough • MusicStrands’ representation (every song is an attribute) doesn’t help much • …though might be possible to infer from network • Pandora “music genome” (400 attributes for all music) is ideal • Find points far away instead of nearby in 400-D metric space • Could do “Anti-Britney Spears Radio”! 10 April 08
Good Research Is Difficult (1) • 1. Hard to evaluate reliability of info sources • Especially difficult on the Web • Ex: www.dhmo.org • 2. People see what they expect to see • Ex: use of kitchen sponges increases E. coli • 3. Almost everything in the world is complex, messy, etc. • Backus (in Musical Acoustics): why musicians’ explanations in acoustics are almost always wrong • “Classification: Logician General’s Warning” • Ex: What was the first piano? What is a trombone? 8 April 08
Good Research Is Difficult (2) • 3. Easy to overgeneralize • Ex: Blair & Maron (1985): An Evaluation of Retrieval Effectiveness for a Full-text Document-Retrieval System. CACM 28(3) • Famous paper in text-IR research world • Well-thought-out, meticulously done large-scale study • Conclusion (essentially): fulltext IR (vs. using abstracts, hand indexing) isn’t worth the trouble(!) • Faulty assumptions: • Litigation is typical domain, so recall is critical; no statistical methods; storage is expensive; text must be entered for IR system • Ex (fiction, but very plausible): Asimov short story: “Not Final” 8 April 08
Further Information • Music Recommendation Tutorial • by Paul Lamere & Òscar Celma, at ISMIR 2007 • http://mtg.upf.edu/~ocelma/MusicRecommendationTutorial-ISMIR2007/ • Paul Lamere’s “Duke Listens!” blog • http://blogs.sun.com/plamere/ • My “Information Sources for Music Informatics Students” • http://www.informatics.indiana.edu/donbyrd/Teach/GeneralInformationSources.html 10 April 08