Improving Response Prediction for Dyadic Data

Improving Response Prediction for Dyadic Data Nik Tuzov April 2008 http://www.stat.purdue.edu/~ntuzov/

Dyadic Data • Means that a certain “response value” is associated with a pair of objects Applications: • Social networks • Internet advertising • Recommendation systems

Unsupervised learning • Example: Collaborative filtering (MovieLens project) • Movie 1 is “similar” to 5, hence Y is likely “B” • Users 1, 2, 3 are “similar” to each other, hence X is likely “C” or “D”

Co-clustering with Bregman differences • K*L rectangular clusters – direct products of row/column clusters

Co-clustering with Bregman differences(example from http://videolectures.net/kdd07_agarwal_pdlfm/)

PDLF-GLM Model(Agarwal & Merugu’07)

Neural Network as alternative to GLM

Algorithm

Data: MovieLens • 20603 ratings, 346 users, 966 movies • From 1 to 198 ratings per movie, 32 to 105 ratings per user. • 50 covariates for each (user, movie) pair • 5700 observations held out for validation • Using area under Receiver Operating Characteristic (ROC) curve to measure performance

Neural Network Topology

Number of nodes? • 40 nodes appear enough (produce similar overfitting)

Results

New Covariates? Sample movies from the cluster with delta = -0.57 : • 756 ratings; 23 females and 55 males; No documentaries

Contribution to ROC

Is Neural Network useful? • Gain in ROC area depends on the order: extra linear features (n/network) are added first => gain from co-clustering is reduced • The opposite is also true • Hence, info in linear features is similar to that in clusters, so • For this dataset, n/network is not so helpful, but… • For other dyadic datasets, n/network can be a lot more useful

Related Work • What if we want to predict response on (Web page, Search query, Web user) ? • B. Long, X. Wu, Z. Zhang, and P. S. Yu. Unsupervised learning on k-partite graphs. In KDD, 2006.

Additional Info • To obtain a detailed report and Matlab code, please visit my website: http://www.stat.purdue.edu/~ntuzov/ • The project is posted in “Software skills / Matlab” section • Questions? Contact me on ntuzov@purdue.edu

Improving Response Prediction for Dyadic Data