Jingjing Liu jingjing@sc A talk at School of Information and Library Science

Beyond Google: Improving Searchers’ Experience & Satisfaction Using their Behaviors and Contexts Jingjing Liu jingjing@sc.edu A talk at School of Information and Library Science University of North Carolina – Chapel Hill November 30, 2012

Outline • Background • Two approaches that help systems to understand and make use of users’ behaviors and contexts, and provide desired search results or search assistance • Personalization of search using behaviors and context factors • Understanding and predicting search task difficulty from behaviors • Future directions • Wrap up

Search engines • Do a decent job with simple and unambiguous search tasks • For example: where is the University of North Carolina at Chapel Hill? • Google search features • Query automatic completion • Instant results • Address right there in the search result snippiet • Maps, pictures • …

However… • Search engines do not do as well for some other tasks which could be ambiguous, e.g., • “jaguar” by a kidlooking for some fun pictures of the big animal or • by a car buyerlooking for information about the car make

Or… • Tasks could be complex or difficult, e.g., • Collect information about good renting apartments in Chapel Hill, NC • Why is this not easy? Could be due to… • Searchers’ lack of knowledge about the neighborhood • Not a single webpage available to complete this task • Analysis need(s) to be done, and decision(s) needs to be made • …

The problem • Traditional search systems return results based almost only on search queries by keywords matching • Little consideration on who, when, where, and why (for what purposes), or the searcher’s task at hand • Current search engines have started to incorporate some of these factors into search algorithms, e.g., • location detection • search history • peer use… • But there are more that needs to be done

Two approaches • Towards personalization of search systems • Tailor search results to specific users and/or user gruops • Understanding users behaviors & the roles of search contexts in interpreting user behaviors for their search interests • Predicting document usefulness based on user behaviors and contextual factors • Understanding and predicting search task difficulty • Characterizing user behaviors in difficult vs. easy tasks • Predicting search task difficulty from behaviors • Understanding why a task is difficult to users • Providing desired assistance

Approach 1: Towards Personalization of Search: Understanding the Roles of Contexts Related publications: Liu, J. & Belkin, N. J. (journal article in process). Exploring the roles of contextual factors in personalizing information search. Liu, J. & Belkin, N. J. (2010). Personalizing information retrieval for multi-session tasks: The roles of task stage and task type.SIGIR ’10. Liu, J. & Belkin, N. J. (2010). Personalizing information retrieval for people with different levels of topic knowledge. JCDL ’10.

Rationale • Tailoring search toward specific users or user groups to improve users’ search experience and satisfaction • Further understanding of users & their contexts • preferences & interests: what information is desired and useful? • a person's individual characteristics: knowledge level, cognitive ability, etc. • tasks at hand: type, difficulty level, complexity, etc. • current situation: time, location, stage, etc. • Building models of document usefulness prediction • Techniques: result re-ranking or query reformulation

Context • Plays important roles in user-information interaction • An umbrella term, including multiple aspects Culture Location Individual background Economics Time … Tasks … …

How to learn about user interests? • Explicitly asking • Users unwilling to do so • Implicitly guessing from user behaviors • Not accurate enough

Problems with interpreting reading time Varies with contextual factors • Tasks • Knowledge • Cognitive abilities • … Document usefulness Reading time cannot reliably predict (User interest) (Behaviors)

A three-way relationship Context can be learned explicitly or implicitly User Behaviors Document Usefulness observable can be learned explicitly or implicitly 14

Research questions Task stage Task type (Contextual factors) Document usefulness Reading time cannot reliably predict (User interest) (Behavioral data) RQ1: Does the stage of the user’s task help in interpreting time as an indicator of document usefulness? RQ2: If task stage helps, does this role vary in different task types? 15

Multi-session tasks • Often seen in everyday life • 25% web users conducted multi-session searches (Donato, Bonchi, Chi, & Maarek, 2010 ) • Could occur due to • time constraints • difficulties in locating desired information • complexity of the tasks • Provide a natural, reasonable, simple, and meaningful way to identify stages in task completion • Has been used in lab experiment setting (Lin, 2001)

General Design & Data Collection • 3-session (stage) lab experiment, each session one sub-task • 24 Journalism/Media Studies undergraduates • 2 task types according to sub-task relationship

Parallel vs. Dependent Tasks Suppose you are a journalist writing a feature story about hybrid cars. You want to write the article in three sections, one at one time: In a Parallel Task • Honda Civic • Toyota Camry • Nissan Altima In a Dependent Task • Collect information on which manufacturers have hybrid cars. • Select three models that mainly focus on in the article. • Compare the pros and cons of three models of hybrid cars. 18

General Design & Data Collection • Each session: about an hour • 40 mins. search & report writing • Webpage usefulness judgment after report submission (7-point scale) • Questionnaires eliciting knowledge, task difficulty,etc.

General Design & Data Collection • System • Normal IE • IE Plus: system recommended keywords • No system effect on users’ reading time

Search systems 2 versions: IE (normal) vs. IE Plus (term recom.) Note: no effect on users’ reading time System recommended terms IE Plus version

General Design & Data Collection • Logging software Morae: • mouse • keyboard • time stamps of each action event • screen video

Data analysis Task stage (Contextual factors) Document usefulness Reading time cannot reliably predict (User interest) (Behavioral data)

Data analysis method General Linear Model Time = αusefulness + βstage + γusefulness*stage • Examination of the relationship among three variables • time: • first reading time on a page (revisiting not counted) • total reading time on a page (revisiting counted) • usefulness: 7-point ratings  3 levels: little, somewhat, very useful • stage: 1, 2, 3 • Analysis conducted in • both tasks combined • dependent task • parallel task

Total reading time (both tasks) Stage (S) Time index GLM P values: S: .187 U: .000 S*U: .514 Usefulness levels (U)

Total reading time (dependent task) Stage (S) Time index GLM P values: S: .276 U: .000 S*U: .507 Usefulness levels (U)

Total reading time (parallel task) Stage (S) Time index GLM P values: S: .621 U: .000 S*U: .791 Usefulness levels (U)

Summary of these findings • Strong correlation between usefulness and time • Possible reason: writing reports in parallel with searching and reading information • Implications: this time alone can be a reliable indicator of usefulness • No significant difference among different stages • Stage did not play a role • No differences between task types • Task type did not play a role • However, total reading time cannot be easily obtain by system in real-time

First reading time (both tasks) Stage (S) 1 Time index 2 GLM P values: S: .722 U: .116 S*U: .006 3 Usefulness levels (U) 29

Summary: both tasks combined • Usefulness and first reading time did not have significant correlation • First reading time only is not a reliable indicator of usefulness • Stage and usefulness had significant interaction on time. • Stage played a role 30

First reading time (parallel task) Stage (S) 1 Time index GLM P values: S: .639 U: .869 S*U: .043 3 2 Usefulness levels (U) 31

Summary: parallel task task • Usefulness and first reading time did not have significant correlation • First reading time only is not a reliable indicator of usefulness • Stage and usefulness had significant interaction on time • Possible explanation: subtask differences were only car type; users in later sessions could have obtained some knowledge about what kinds of pages were useful 32

First reading time (dependent task) Stage (S) 1 2 Time index GLM P values: S: .454 U: .036 S*U: .180 3 Usefulness levels (U) 33

Summary: dependent task • Usefulness and first reading time have significant correlation • First dwell time only could reliably indicate usefulness • Stage and usefulness did not have significant interaction on time. • Stage did not play a role • Possible explanation: sub-tasks are independent upon each other; users’ knowledge did not increase for each of them 34

Summary of findings for reading time • First reading time (which can be easily captured by the system) cannot always be a reliable indicator of usefulness on its own • Stage could help • Task type matters

The factor of user knowledge • In addition to task stage, we also looked at users’ knowledge as a contextual factor • Similar patterns as the stage factor • When knowledge and stage both considered, knowledge plays a more significant role • However, stage is more easily determined in practice

Significance of the study • Found that contexts do matter in inferring document usefulness from behaviors • Task stage • Task type • User knowledge • Created a method to explore the effects of contextual factors through lab experiments • Has implications on search system design • Taking account of task stage, user knowledge, and task typecould help infer users’ interests and accordingly tailor search results to specific searchers

Limitations & follow up studies • Limitations • Lab experiment • Effect size • Future studies • Other contextual factors: other task type, etc. • Other behaviors than dwell time: clickthrough, revisit, etc. • Naturalistic study • Build models of usefulness prediction based on behaviors and contextual factors • Prototype building and evaluation

Approach 2: Understanding and Predicting Search Task Difficulty Related publications: Liu, J. & Kim, C. (2012). Search tasks: Why do people feel them difficult? HCIR 2012. Liu, J., Liu, C., Cole, M., Belkin, N. J., & Zhang, X. (2012). Examining and predicting search task difficulty. CIKM 2012. Liu, J., Liu, C., Yuan, X., & Belkin, N. J. (2011). Understanding searchers’ perception of task difficulty: Relationships with task type. ASIS&T 2011. Liu, J., Gwizdka, J., Liu, C., & Belkin, N. J. (2010). Predicting task difficulty in different task types. ASIS&T 2010. Liu, J., Liu, C., Gwizdka, J., & Belkin, N. J. (2010). Can search systems detect users’ task difficulty? Some behavioral signals. SIGIR 2010.

Rationale • Search systems needs to be improved for better performance in “difficult” search tasks • It is important for the system to detect when users are having “difficult” tasks • Whether and when to intervene and/or provide assistance • Prevent users from getting frustrated or switching to other search engines (from individual search engine’s perspective)

Task difficulty & search behaviors Previous studies found more difficulty tasks are associated with users • visiting more webpages(Kim 06; Gwizdka & Spence 06) • issuing more queries (Kim 06; Aula et al., 10) • spending more time on Search Engine Result Pages (SERPs) (Aula et al., 10)

Behaviors as task difficulty predictors • Some good predictors of task difficulty (Gwizdka 08) • task completion time, • number of queries, etc. • The problem • Whole-session level factors, cannot be obtained until the end of the search

Levels of user behaviors • Whole-task-session level • e.g., task completion time, total number of queries, total number of webpage visits • cannot be captured until the end of a session, therefore, not good for real-time system adaptation • Within-task-session level • e.g., dwell (reading) time, number of content pages viewed per query • can be captured in real-time

The current study • Revisits relations between task difficulty and user behaviors • Explores what behaviors are significant in predicting task difficulty • Especially within-session behaviors

Methodology • Controlled lab experiment • 48 student participants • Find useful webpages and save (bookmark & tag) them • Search tasks • 12-task pool: 6 pairs • Each participant worked with 6 out of 12, at the choice of their preferences in 6 pairs of questions • 3 types • FF-S: fact finding – single item • FF-M: fact finding – multiple items • IG-M: information gathering – multiple items • Task type showed effects on user behaviors, but the current presentation does not focus on this

FF-S task example “Everybody talks these days about global warming. By how many degrees (Celsius or Fahrenheit) is the temperature predicted to rise by the end of the XXI century?” • One piece of fact

FF-M task example “A friend has just sent an email from an Internet café in the southern USA where she is on a hiking trip. She tells you that she has just stepped into an anthill of small red ants and has a large number of painful bites on her leg. She wants to know what species of ants they are likely to be, how dangerous they are and what she can do about the bites. What will you tell her?” • 3 pieces of facts

IG-M task example “You recently heard about the book "Fast Food Nation," and it has really influenced the way you think about your diet. You note in particular the amount and types of food additives contained in the things that you eat every day. Now you want to understand which food additives pose a risk to your physical health, and are likely to be listed on grocery store labels.” • Information gathering of 2 concepts

Data collection • Post-task questionnaires • Ratings of task difficulty, etc. • 5 point  binary (rating scores 1-3; 4-5) • Morae logging software • Mouse • Keyboard • Webpages • Time stamp of each activity • Screen video

Whole-session level behaviors • Pages visited • Number of content pages (all, unique) • Number of SERPs (all, unique) • Queries • Number of all queries issued • Queries leading to saving pages (number, ratio) • Queries not leading to saving pages (number, ratio) • Time • Task completion time • Total time on content pages • Total time on SERPs

Jingjing Liu jingjing@sc A talk at School of Information and Library Science