1 / 33

E-Commerce

E-Commerce. Outline. Introduction Customer Data on the Web Automated Recommender Systems Networks and Recommendations Web Path Analysis for Purchase Prediction. Introduction. Some Motivating Questions

godfrey
Télécharger la présentation

E-Commerce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. E-Commerce

  2. Outline • Introduction • Customer Data on the Web • Automated Recommender Systems • Networks and Recommendations • Web Path Analysis for Purchase Prediction

  3. Introduction • Some Motivating Questions • Can we design algorithms to help recommend new products to visitors based on their browsing behavior? • Can we better understand factors influencing how customers make purchases on a website? • Can we predict in real time who will make purchases based on their observed navigation patterns?

  4. Customer Data on the Web • Data collection on client, server sides and anywhere in between • Goal determine who is purchasing what products • Tracking customer data • Web logs, E-Commerce logs, cookies, explicit login • Data then used to provide personalized content to site users to: • Assist customers in locating their target selections • “Encourage” customers to make certain selections

  5. Automated Recommender Systems • Problem framed in two ways • Users ‘vote’ for pages/items (binary) • Users rank pages/items (multivalued) • Results are captured in a generally sparse matrix (users x items) • Complication: no votes can occur because users do not vote on items they do not like (Breeze, et al 1998) • Ignored by most recommender systems

  6. Automated Recommender Systems

  7. Evaluating Recommender Systems • Cautions in data interpretation • Users may purchase items regardless of recommendations • Users may also avoid purchases they might have made based on recommendations • Approaches to recommender algorithms • Nearest-neighbor • Model-based collaborative filtering • Others?

  8. Nearest-Neighbor Collaborative Filtering • Basic principle: utilize user’s vote history to predict future votes/recommendations • Find most similar users to the target user in the training matrix and fill in the target user’s missing vote values based on these “nearest-neighbors” • A typical normalized prediction scheme: goal: predict vote for item ‘j’ based on other users, weighted towards those with similar past votes as target user ‘a’

  9. Nearest-Neighbor Collaborative Filtering • Another challenge: defining weights • What is “the most optimal weight calculation” to use? • Requires fine tuning of weighting algorithm for the particular data set • What do we do when the target user has not voted enough to provide a reliable set of nearest-neighbors? • One approach: use default votes (popular items) to populate matrix on items neither the target user nor the nearest-neighbor have voted on • A different approach: model-based prediction using Dirichlet priors to smooth the votes (see chapter 7) • Other factors include relative vote counts for all items between users, thresholding, clustering (see Sarwar, 2000)

  10. Nearest-Neighbor Collaborative Filtering • Structure based recommendations • Recommendations based on similarities between items with positive votes (as opposed to votes of other users) • Structure of item dependencies modeled through dimensionality reduction via singular value decomposition (SVD) aka latent semantic indexing (see chapter 4) • Approximate the set of row-vector votes as a linear combination of basis column-vectors • i.e. find the set of columns to least-squares minimize the difference between the row estimations and their true values • Perform nearest-neighbor calculations to project predictions for all items

  11. Model Based Collaborative Filtering • Recommendations based on a model of relationships between items based on historical voting patterns in the training set • Better performance than nearest-neighbor analysis • Joint distribution modeling • Uses one model as basis for predictions • Conditional distribution modeling • A model for each item predicting future vote based on votes for each of the other items

  12. Model Based Collaborative Filtering • Joint distribution modeling: A practical approach • Model joint distribution as a finite mixture of simpler distributions • Additional simplification is achieved by assuming that votes are independent of others within a component • Limitation: assumes that users can be described with one model of the ‘K’ mixture components • Hoffman and Puzicha (1999) propose a workaround asserting that each row of votes represents up to ‘K’ mixture components, rather than a single component

  13. Model Based Collaborative Filtering • Another limitation: all predictions are based on the (static) training set • Conditional distribution modeling • Better results by creating a model for each item conditioned on the others rather than using a single joint density model • Decision trees Heckerman et al. (2000) • Greedy approach to approximate tree structure • Predictions are made for each item not purchased or visited • Performance • Accuracy nearly equal to Bayesian networks • Offline memory usage significantly less than Bayesian networks • Offline computation time complexity better than Bayesian networks

  14. Model-Based Combining of Votes and Content • Combine content-specific information with other information (e.g. structure, vote) • Useful for determining item similarity (Mooney and Roy 2000) and creating user models • Useful when there is no vote history • Implementation (Popescul et al 2000) • Extension of (Hoffman and Puzicha 1999) • Joint density is determined assuming a hidden latent variable making users, documents, and words conditionally independent i.e.

  15. Model-Based Combining of Votes and Content • The hidden variable represents multiple (hidden) topics of a document • Conditional probabilities of the hidden parameter are calculated using EM • Sparsity still remains a problem for content-based modeling

  16. Challenges • Noisy Data • The same user may use multiple IP addresses/logins • Different users may use the same IP address/login • Privacy • No cookies! • Changing user habits • Previous history may not accurately predict present purchase selection • Continuous updating of user activities

  17. Networks & Recommendation • Word-of-Mouth • Needs little explicit advertising • Products are recommended to friends, family, co-workers, etc. • This is the primary form of advertising behind the growth of Google

  18. Email Product Recommendation • Hotmail • Very little direct advertising in the beginning • Launched in July 1996 • 20,000 subscribers after a month • 100,000 subscribers after 3 months • 1,000,000 subscribers after 6 months • 12,000,000 subscribers after 18 months • By April 2002 Hotmail had 110 million subscribers

  19. Email Product Recommendation • What was Hotmail’s primary form of advertising? • Small link to the sign up page at the bottom of every email sent by a subscriber • ‘Spreading Activation’ • Implicit recommendation

  20. Spreading Activation • Network effects • Even if a small number of people who receive the message subscribe (~0.1%), the service will spread rapidly • This can be contrasted with the current practice of SPAM • SPAM is not sent by friends, family, co-workers • No implicit recommendation • SPAM is often viewed as not providing a good service

  21. Modeling Spreading Activation • Diffusion Model • Montgomery (2002) • Applied models used in marketing literature, Bass (1969) to the hotmail phenomena • Similar word-of-mouth networks used in selling consumer electronics such as refrigerators and televisions • We want to predict at time t how many individuals k(t) will adopt the product out of a population of N possible adopters

  22. Modeling Spreading Activation • Diffusion Model • Two ways individuals will subscribe • Direct Advertising • At time t, N – k(t) individuals have not subscribed • α ≥ 0 percent of these individuals will subscribe due to direct advertising • Word-of-Mouth • At time t, there are k(t)(N – k(t)) possible connections between subscribers and non-subscribers • β ≥ 0 percent of these connections will cause a non-subscriber to subscribe

  23. Modeling Spreading Activation • Combine these and we get the following expression: • Solve this and we get:

  24. Modeling Spreading Activation

  25. Modeling Spreading Activation

  26. Modeling Spreading Activation • Diffusion Model • This does not completely model the what actually occurred • However, it is simple and provides a lot of interesting (useful) information • Other work • Domingos & Richardson (2001) Markov Random Field Model • Daley & Gani (1999) various deterministic and stochastic models

  27. Purchase Prediction • We want to predict whether or not a shopper will make a purchase • We know demographics • We know page view patterns • Can we accurately predict if the user will make a purchase or not?

  28. Purchase Prediction • Li et al. (2002) • Study 1160 shoppers at www.barnesandnoble.com between April 1 and April 30, 2002 • The data was collected client side so they knew exactly what pages were displayed to the user • They also knew the demographics (predominantly well-educated and affluent)

  29. Purchase Prediction • Li et al. (2002) • There were 14,512 page views which they divided into 1659 sessions • Mean: 8.75 • Median: 5 • Standard deviation: 16.4 • Min: 1 • Max: 570 • 7% of sessions contained a purchase

  30. Purchase Prediction • Li et al. (2002) • Divided the pages into 8 classes • Home (H), main page • Account (A), account information pages • List (L), pages with lists of items • Product (P), page with a single item • Information (I), informational pages (shipping etc.) • Shopping cart (S) • Order (O), indicates a completed order • Entry or Exit (E), entering or leaving the site

  31. Purchase Prediction • Li et al. (2002) • Each session was represented by a string of the form: I H H I I L I I E • A session containing an O is considered having made a purchase • The average length of a session with a purchase was 34.5 and without was only 6.8

  32. Purchase Prediction • Markov transition matrix • For sessions with no purchase

  33. Purchase Prediction • Li et al. (2002) • They did several models based on this data • Tested on predicting next page and predicting a purchase • Best models 64% accurate at predicting next page • After 2 page views the best models predicted 12% true positives and 5.3% false positives • After 6 page views 13.1% true positives and 2.9% false positives

More Related