640 likes | 796 Vues
Web Behavior Analysis. Your Last Words? (in 22 nd century). To family To your best friend?. Web Behavior Analysis. Why important? Why scary?. Part I: Why Important?. We rely more and more on search for our real-life decision Opportunities for business Concerns for privacy.
E N D
Your Last Words? (in 22nd century) • To family • To your best friend?
Web Behavior Analysis • Why important? • Why scary?
We rely more and more on search for our real-life decision Opportunities for business Concerns for privacy Users Relying More on Search Q. In the past six months have you used a search engine to help inform your decisions for the following tasks? 66% of people are using search more frequently to make decisions Users need help with tasks and making decisions
Focus on new territory What should be done? Decision Sessions are Lengthy Length of Sessions by Type Complex task and decision sessions could be easier
Taxonomy of Web queries • Navigational (we are good at this) • to reach a particular site • E.g., Searching for top page of company • Informational • to acquire pages that provide knowledge for user’s information need • Conventional ad hoc retrieval • Transactional • to perform a Web-mediated activity • E.g., online shopping
Example: Good and Bad Navigational Queries Pseudo- Navigational Queries
Example of “Hard Queries”:Informational/Transactional Car GPS around $300 Four day trip to Bhutan from Delhi to visit important Buddhist places
Party Site Game Consoles
Current research directions • How to classify queries? • Then what? • Search engines trying to reduce clicks for “hard queries” • Extracting info from forum
Importance of query classification: “obama” • Informational: People may search to know more about Barak Obama • Navigational: visit his official website • Transactional: perhaps the user goal is to donate money online to support Mr. Obama’s campaign
Yahoo numbers • ~25 informational content text? • ~40 navigational anchor text? • ~35 transactional site template?
Lee et al.[WWW05]: Overview • Analyzing how query term is used in anchor texts Q = “search” Q = “WWW2008” search search WWW2008 WWW2008 Description in Wikipedia Search engine Top page of WWW2008 Destinations are diverse → Informational Destinations are identical → Navigational
Anchor-link distribution (ALD) Probability that page linked by t is d t = search t = WWW2008 ALD is uniform ALD is skewed Google Yahoo! Wikipedia Top page of WWW2008 Informational Navigational
Lee et al.: Problem • Targeting only anchor texts that are exactly same as the query • If the same anchor text as the query does not exist on the Web, ALD cannot be computed • Problematic queries • Long phrase • E.g., “information retrieval system research” • Multiple keywords • E.g., “trec, nist, test collection”
Multi-query solution QueryQ = “trec, test collection” TermsT = {trec, test, collection} destinationsD = {d1, d2, …} t = trec t = test t = collection Compute ALD on a term-by-term basis and integrate them
Computation of classification score • Entropy of D Entropy of a single term t Weighted average
Now what? • For “WWII” • Google: http://www.google.com/search?q=WWII&hl=en&tbo=1&output=search&tbs=ww:1 • Microsoft: http://www.bing.com/reference/semhtml/World_War_II?fwd=1&qpvt=wwii&src=abop&q=wwii • Wolfram: http://www.wolframalpha.com/input/?i=wwII • Can you tell information vs. transactional?
Challenges/Opportunities • Slightly subtle/interleaved • But huge advertisement revenue (yet to be explored)!!!! • Classic querylog+Clicks on surface web not enough.. • Any ideas?
Eye movement? Brain signal? More signals?
CS: Client Simple • First representation: • Trajectory length • Horizontal range • Vertical range Horizontal range Trajectory length Vertical range
CF: Client Full • Second representation: • 5 segments: initial, early, middle, late, and end • Each segment: speed, acceleration, rotation, slope, etc. 1 2 3 4 5
More corpus • cQA successful, as “additional corpus”, not as “additional means” • Challenges?
Good Q/A? -- Text Check also: http://www.addedbytes.com/code/readability-score/
Useful beyond imagination • Spell checker: SIGMOD Did you mean “sigmoid”? • Entity relation: SIGMOD ~ SIGIR • Translation: SIGMOD, 씨그모드 sigmod.com • Query suggestion: 영일대 호텔 영일대 • Rank learning: top 10 entry is visited all the time, what should we do? • Reason of migrain?
Companies need YOUR HELP • AOL released logs • Guess what happened?
More scientific observations (Yahoo Research) • X={query1, query2, query3} • Y= age gender area XY (how likely?) Validate with ground-truth info (Yahoo account)
See if you can do it? • You observe yourself: http://aolpsycho.com/user/5826-kallemeyn