Music Recommendation A Data Mining Approach
210 likes | 375 Vues
Music Recommendation A Data Mining Approach. Daniel McEnnis 2nd year PhD. Overview. High level overview Toolkit Improvements Experiments Evaluation Algorithms research Data Future work. Project Goals. Integrate social information Make algorithms ‘culturally aware’
Music Recommendation A Data Mining Approach
E N D
Presentation Transcript
Music RecommendationA Data Mining Approach Daniel McEnnis 2nd year PhD
Overview • High level overview • Toolkit Improvements • Experiments • Evaluation • Algorithms research • Data • Future work
Project Goals • Integrate social information • Make algorithms ‘culturally aware’ • Implement existing algorithms • Systematic evaluation framework
Similarity Algorithms • Create new relations based on some aspect of similarity • 6 different varieties of similarity • Each algorithm can use one of 6 distance functions
Aggregator Algorithms • Takes data from one set of actors and moves it to another • 6 different varierties • Each variety uses one of 7 aggregator functions • Basic building block of Graph-RAT applications
Graph Triples Census • Probable novel algorithm • Proof of Correctness Completed • Proof of Time Complexity Completed • Literature review in progress
SUCCESS! • Graph-RAT programming language now functioning • Graph-RAT integrates social, cultural, personal, and audio data into algorithms • Includes most commercial algorithms • Contains primitives for existing academic systems • Evaluation is entirely automated
Evaluation Exploration • 9 types of music recommendation • Personalized versus generic • Open query versus targeted query • Dynamic versus static data • New music versus all music
Personalized Radio • Open query with personalized presentation • Static data vs dynamic data • New items prediction vs predict anything
Targeted Search • Not personalized • Similarity queries • Automatically generating targeted lists for a browsing hierarchy • New music vs all music • Static vs dynamic data
Personalized Tag Radio • Create a personalized play list matching a given query • New music vs all music • Static vs dynamic data
Excluded Types • ‘Top 40’ prediction • Rendered obsolete by other types
Existing Algorithms • Item-to-Item collaborative filtering • 7 variations • User-to-user collaborative filtering • 7 variations • Associative mining collaborative filtering • Direct machine learning playlist data • Direct machine learning audio data
Novel Algorithms • Machine learning over profile data • Machine learning over cultural and profile data • Machine learning on different concatenations • Audio • Playlist • Profile • Cultural
Initial Data • LiveJournal • Separating music data is difficult • No tag info or audio content • No enough musical data • LastFM by User • No audio content • Data cleaning is an issue
Current Data • 40’s Jazz Recordings • 1800 annotated recordings from 70 CDs • Covers nearly all 40’s popular music • LastFM by Song • Retrieves tag and user info by song • Data cleaning on user playcounts needed
Data Cleaning Tags • Polysemy • Synonomy • Disjoint • Hypersomny • Hyposomny • Initial algorithms developed
Future Work: Programming • Radically different programming environment • SQL • LINQ library package in C#
Future Work: Scalability • Distributed SQL database implementation • Just-in-time compilation • Event-based recalculation of algorithm results • Parallel execution of algorithms • Multi-threaded algorithms