1 / 42

How Long will She Call Me? Distribution, Social Theory and Duration Prediction

How Long will She Call Me? Distribution, Social Theory and Duration Prediction. Yuxiao Dong * , Jie Tang $ , Tiancheng Lou # , Bin Wu & , Nitesh V. Chawla *. *University of Notre Dame $ Tsinghua University # Google Inc. & Beijing U. of Posts & Telecoms.

derry
Télécharger la présentation

How Long will She Call Me? Distribution, Social Theory and Duration Prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How Long will She Call Me? Distribution, Social Theory and Duration Prediction Yuxiao Dong*, Jie Tang$, Tiancheng Lou#, Bin Wu&, Nitesh V. Chawla* *University of Notre Dame $Tsinghua University #Google Inc. &Beijing U. of Posts & Telecoms Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13.

  2. Outline • Motivation • Dynamic Distribution on Duration • Social Theory on Duration • Duration Prediction • Conclusion

  3. Motivation • Mobile calls between humans are ubiquitous at any time … • 91% of American adults have a mobile phone in May 2013[1]. • Mobile users can’t leave their phone alone for 6 minutes and check it up to 150 times a day[2]. • People make, receive or avoid 22 phone calls every day[2]. • Pew Internet: Mobile Reports. June 6, 2013. http://pewinternet.org/Commentary/2012/February/Pew-Internet-Mobile.aspx • Tomi Ahonen. Communities Dominate Brands. http://communities-dominate.blogs.com/

  4. Duration Macro-Distribution • Double pareto lognormal distribution (DPLN) [1]. • Truncated log-logistic distribution(TLAC)[2]. • M. Seshadri, A. Srid. J. Bolot. C. Faloutsos and J. Leskovec. Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. In KDD’08. • P. Melo, L. Akoglu, C. Faloutsos and A. Loureiro. Surprising Patterns for the call duration distribution of mobile phone users. In PKDD’10

  5. Mobile Data • Call Detailed Records (CDR): • 3.9 million CDRs; • 2 months (Dec. 2007 & Jan. 2008); • Non-America. • Mobile Network: • 272,345 users and 521,925 call edges. • Pareto Principle: • 20% pairs of users produce 80% calls. One-week data is available at http://arnetminer.org/mobile-duration

  6. Roadmap • Dynamic Dist. on Duration • Temporal distribution. • Demographics distribution. [1] • Existing Macro-Distribution. • DPLN distribution • TLAC distribution • V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.

  7. Roadmap • Dynamic Dist. on Duration • Temporal distribution. • Demographics distribution. • Social Theory on Duration • Strong/weak tie • Homophily • Opinion leader • Social balance [1] • Existing Macro-Distribution. • DPLN distribution • TLAC distribution • V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.

  8. Roadmap • Dynamic Dist. on Duration • Temporal distribution. • Demographics distribution. • Social Theory on Duration • Strong/weak tie • Homophily • Opinion leader • Social balance [1] • Existing Macro-Distribution. • DPLN distribution • TLAC distribution • Duration Prediction • Dynamic factors • Social factors • V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.

  9. Dynamic Distribution on Duration

  10. Periodicity • Periodic patterns for mobile call duration: • Working time (8:00AM-7:00PM), 75 seconds in average; • Evening (7:00PM-12:00AM), increasing to150 seconds on mid-night; • Early Moring (12:00AM-8:00AM), decreasing to 50 seconds.

  11. Demographics • Call Duration VS. Demographics: • Longer calls by female than male; • Longer calls between 2 females than 2 males; • Longer calls from M to F than F call M; • Longer calls if younger.

  12. Social Theory on Duration

  13. Social Theory • Strong/weak tie: • How long do people with a strong or weak tie call? • Link homophily: • Do similar users tend to call each other with long or short duration? • Opinion leader: • How different are the calling behaviours between opinion leaders and ordinary users? • Social balance: • How does the duration-based network satisfy social balance theory?

  14. Strong/Weak Tie [1] Using the #calls to measure the tie strength between two users. • http://www.thomashutter.com/index.php/2012/01/facebook-die-rolle-von-social-networks-in-der-informationsverbreitung/ • Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08.

  15. Strong/Weak Tie Probability that the call is < 60s. [1] • Call Duration VS. Social Tie: • The stronger tie, shorter calls. • 80% probability that the call is < 60s if they call each other for 1000 times two month. • Different from online instant messaging network[2]. Using the #calls to measure the tie strength between two users. • http://www.thomashutter.com/index.php/2012/01/facebook-die-rolle-von-social-networks-in-der-informationsverbreitung/ • Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08.

  16. Link Homophily [1] Using #common neighbours between two users to measure homophily. • Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug. 2013

  17. Link Homophily Probability that the call is < 60s. [1] Using #common neighbours between two users to measure homophily. • Call Duration VS. Link Homophily: • More common neighbors, shorter calls. • 80% probability that the call is < 60s, if they have >30 common neighbors. • Call Duration VS. Social Tie + Link Homophily: • More homophily and stronger ties, shorter calls. • Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug. 2013

  18. Opinion Leader [1] Using PageRank to mine top 1% users as opinion leaders in mobile call network. The other as ordinary users. 1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973

  19. Opinion Leader OL: opinion leader OU: ordinary user Probability that the call is < 60s. [1] Using PageRank to mine top 1% users as opinion leaders in mobile call network. The other as ordinary users. • Call Duration VS. Opinion Leader: • OL make shorter calls in general, the prob is about 80% that OL’s calls are < 60s; • Calls between 2 OLs are shorter. 1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973

  20. Social Balance Structural balance: all three users are friends or only one pair of them are friends. Assume two users are friends if they call each other at least once. Relationship balance: the balance rate is the percentage of triangles with even number of negative ties. Assume a tie is a negative one based on #calls or average duration between two nodes.

  21. Social Balance Structural balance: all three users are friends or only one pair of them are friends. Assume two users are friends if they call each other at least once. Relationship balance: the balance rate is the percentage of triangles with even number of negative ties. Assume a tie is a negative one based on #calls or average duration between two nodes. < 20%, not balanced • Call Duration VS. Social Balance: • Unbalanced in structural balance • Balanced in relationship balance

  22. Duration Prediction

  23. Prediction Scenario Time 1 v4 33s 95s 47s v5 v3 38s 62s v1 v2 132s Attribute factors v1: female, 29y v2: male, 31y v3: male, 60y v4: female, 63y v5: female, 27y

  24. Prediction Scenario Time 1 Time 2 v4 v4 33s 19s 76s 95s 63s 47s v5 v5 v3 v3 38s 40s 62s 441s 16s v1 v1 v2 v2 132s 78s Attribute factors Social factors v1: female, 29y v2: male, 31y v3: male, 60y v4: female, 63y v5: female, 27y Opinion leader: v5 Strong tie: v4, v5 Weak tie: v1, v3 Homophily: v3, v5 Social balance: v3, v4, v5

  25. Prediction Scenario Time 1 Time 3 Time 2 v4 v4 v4 33s 138s 19s 76s 49s 95s 63s 47s ? v5 v5 v5 v3 v3 v3 38s 40s Can we predict how long this call lasts for? 62s 441s 54s 16s v1 v1 v1 v2 v2 v2 95s 132s 78s Attribute factors Social factors Temporal factors v1: female, 29y v2: male, 31y v3: male, 60y v4: female, 63y v5: female, 27y Opinion leader: v5 Strong tie: v4, v5 Weak tie: v1, v3 Homophily: v3, v5 Social balance: v3, v4, v5 v5 calls to v3 on Mon. 10:00PM

  26. Social Time-dependent Factor Graph (STFG) • PFG: • partially labeled factor graph[1] • TRFG: • social triad based factor graph[2] • W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11. • J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

  27. Social Time-dependent Factor Graph (STFG) • PFG: • partially labeled factor graph[1] • TRFG: • social triad based factor graph[2] • STFG: • partially labeled + social triad + time dependent • W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11. • J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

  28. Social Time-dependent FG

  29. Social Time-dependent FG • Joint distribution: Social Temporal Attributes

  30. Social Time-dependent FG • Joint distribution: Social Temporal Attributes • Exponential-linear functions to initialize factors • Attribute factor: • Social factor: • Temporal factor:

  31. Social Time-dependent FG • STFG objective function: • Learning: Parameters:

  32. Learning Algorithm Gradient decent method. 1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

  33. Learning Algorithm Using Loopy Belief Propagation to compute expectation. Gradient decent method. 1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

  34. Experimental Setup • Prediction • Case 1: predict the duration of next call in the future • Case 2: predict the average duration of calls in a future period

  35. Experimental Setup • Prediction • Case 1: predict the duration of next call in the future • Case 2: predict the average duration of calls in a future period • Data • First 7-week CDR data as historic data • Case 1: 1st call duration in 8th week as next call prediction • Case 2: average duration in 8th week as next average prediction

  36. Experimental Setup • Prediction • Case 1: predict the duration of next call in the future • Case 2: predict the average duration of calls in a future period • Data • First 7-week CDR data as historic data • Case 1: 1st call duration in 8th week as next call prediction • Case 2: average duration in 8th week as next average prediction • Binary Prediction • 60% calls are less than 60 seconds and remaining 40% are > 60s; • There is a jump on telephone bill when it reaches 1 minute; • Setting threshold = 60 seconds to classify calls as long or short calls in this work.

  37. Experimental Setup (Cont.) • Baseline Predictors • SVM: support vector machine by SVM-light. • LRC: logistic regression in Weka. • Bnet: Bayes Network • CRF: conditional random field • Evaluation • Precision / Recall / F1-Measure

  38. Results • Case 1: Next Call Duration Prediction • Case 2: Average Call Duration Prediction

  39. Factor Contribution G: gender A: age W: week D: day B: social balance T: social tie H: homophily O: opinion leader

  40. STFG Convergence Our learning algorithm is able to reach convergence quickly.

  41. Conclusion & Future Work • Conclusions: • Social theory and dynamic distribution have obvious existence in duration network; • Our proposed model can significantly improve the prediction accuracy. • Interesting observations: • Young females tend to make long calls, in particular in the evening; • Familiar people (more calls and more common neighbors) make shorter calls. • Future work: • Inferring call duration by regression model. • Modeling duration prediction into a mobile application.

  42. Thanks Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13. Data&Code: http://arnetminer.org/mobile-duration

More Related