1 / 143

Improving the effectiveness of Web searching: Methodological issues

Improving the effectiveness of Web searching: Methodological issues. Barry Eaglestone. Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk. Overview. An inductive study to build evidence-based meta-cognitive models of web searching by the general public.

vanig
Télécharger la présentation

Improving the effectiveness of Web searching: Methodological issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving the effectiveness of Web searching: Methodological issues Barry Eaglestone Department of Information Studies University of Sheffield B.Eaglestone@shef.ac.uk

  2. Overview • An inductive study to build evidence-based meta-cognitive models of web searching by the general public. • Data modelling issues • A Temporal data modelling solution • Discussion & Final thoughts

  3. Setting the scene – the database approach and state of the art. An inductive study of how the general public search on the web.

  4. Motivation • Need to develop new models for searching: update outdated usage paradigms. • Improve training methods • Develop automated assistance systems

  5. Previous studies of search logs • Web search is shallow + promiscuous • Low use of advanced features • Global statistics • number of queries/search • Pages viewed / user • query reformulation (change in no of terms) • Most users enter few terms • Little to be gained by increasing complexity

  6. The Team Database Database chemoinformatics chemoinformatics Information Seeking Information Seeking

  7. Spectrum of Research Perspective Soft Hard Formally Defined problems Computer world formalisations Hardware / Software solutions Problem Solving formalism Human / organisational issues Modelling/engineering/ empirical Qualitative / quantitative data analysis / modeling Discovery Invention People world IS CS Computer World

  8. Who are the searchers? What are they searching for? Context • The GENERAL PUBLIC • Volunteers (c500 searches): • ICT courses • University evening classes • City Learning Centre courses • Citizens’ forum • Personal contacts • Library • Advertising • Students and academics • + over 1,000,000 search logs anonymous searchers • Observe and record • Over 1,000,000 anonymous search engine transaction logs • c500 observed and recorded searches; talk to searchers Evidence-based meta-cognitive training Intelligent interfaces Meta-cognitive Knowledge about web searching? Determine query similarity Delimit searches Code query transformation Model searches as transformation graphs Data mine for stereotypical search strateges Correlate with who, why and effectiveness Thus, establish evidence-based models of search strategy, related to user and problem characteristics and likelihood of success How do they search? Effectiveness? • Self-selected searches explained through interview and think aloud protocols • 2-3 set searches • Infer effectiveness from • search transformation patterns • subject’s narrative How will we use it?

  9. Why Meta-cognition? . “Meta-cognition refers to higher order thinking which involves active control over the cognitive processes engaged in learning. ….” Livingston (1997) • Meta-cognitive knowledge • “…knowledge of personal variables to general knowledge about how human beings learn and process information, as well as individual knowledge of one’s own learning processes…” e.g. “I have a bad memory!” • Meta-cognitive regulation • “… activities used to ensure that that a cognitive goal has been met….”, e.g., question yourself about the text and then re-read. Livingston (1997)

  10. Verbalizer Holist Analyst Imager Cognitive Styles Analysishttp://www.memletics.com/manual/default.asp?ref=ga&data=999+learning+styles+free+test

  11. Syntactical/quantitative Semantic/qualitative Exite search logs ~106 searches Holistic search logs Supplemented with qualitative data

  12. Preliminary work • Analysis of search logs • Development of descriptive codes • Aim is to form a basis for the analysis of our experimental data

  13. Strengths /Limitations • Large sample • Definitely general public. • No enquiry context – what are they looking for? What are they thinking? • No measure of success. • Are they searching or just browsing? • Where does one enquiry end and another begin? • Limited to one search engine – what did they do during a delay?

  14. qid uid time rank query querymore totwords 343 000000000000006a 192141 0 alco fence company ohio No 4 344 000000000000006a 192219 0 alco fence company ohio No 4 345 000000000000006a 192228 10 alco fence company ohio No 4 346 000000000000006a 192243 20 alco fence company ohio No 4 347 000000000000006a 192328 0 lifetime fence company ohio No 4 348 000000000000006a 192359 10 lifetime fence company ohio No 4 349 000000000000006a 192455 0 lifetime wire fence No 3 Sessions 350 000000000000006a 192634 0 high tensile wire fence No 4 351 000000000000006b 161906 0 sickle cell anemia No 3 352 000000000000006b 162006 10 sickle cell anemia No 3 353 000000000000006b 162130 0 sickle cell anemia No 3 354 000000000000006c 144303 0 Hilton Garden Inn No 3 1 355 000000000000006c 144331 0 Hilton Garden Inn Jacksonville No 4 356 000000000000006c 144433 0 Hotel Search No 2 357 000000000000006c 144541 0 Jacksonvill Hotel No 2 358 000000000000006c 144728 0 www.hilton.com No 1 2 3 Excite Database Sample ~ 106 queries

  15. Query Transformations • Changes in search strategy • conceptual e.g. changes in type of search: broad specific text image • Linguistic: syntactic, query structure. • Examples Q1: shakespeare hamlet Q2: shakespeare hamlet quotes Q3: to be or not to be Q4 “to be or not to be” Q5: “to be or not to be” +shakespeare

  16. Our Preliminary Analysis • To look at textual (syntactic) changes. • Link queries by text similarity. • Infer enquiry change from textual dissimilarity. • Use these elements to develop a machine-readable codification of QT’s. • To mine for characteristic patterns.

  17. Example Transformations

  18. QT graphs paid undergraduate nursing schools in baltimore city maryland nursing careers

  19. QT graphs molsworth "us army"

  20. Preliminary Conclusions • We have developed a rich set of codes describing syntactic part of QT’s • These can be used to develop a graph-based description • Correlations between the codes are meaningful/interesting • They form part of the analysis for our current experimental study.

  21. …and if you want to read about our preliminary results…. • Whittle M, Eaglestone B, Ford N, Madden A (2007), Data Mining of Search Logs, Journal of the American Society for Information Science and Technology (in press) • Whittle M, Eaglestone B, Ford N, Gillet V.J., Madden A (2006), Query Tranformations And Their Role In Web Searching By The General Public, Information Research, 12(1) October 2006 • Whittle M, Eaglestone B, Ford N, Gillet V, Madden A(2006), Query transformations and their role in web searching by the general public. Information Seeking in Context Conference 2006 ISIC, Austrailia • Andrew Madden, Barry Eaglestone, Nigel Ford, MartinWhittle (2006) Search engines: a first step to finding information: preliminary findings from a study of observed searches, Information Seeking in Context Conference 2006 ISIC, Austrailia.

  22. Model development Temporal database Sheffield Experimental Study Keystrokes Queries Web page titles Screens Audio Transcribing Pre-Processing Qualitative analysis Quantitative analysis

  23. Data modelling issues

  24. Setting the scene – the database approach and state of the art. Evolution of databases The database approach – A database should be a naturalrepresentation of information as data, suitable for all relevant applications without duplication, including the ones you have not yet though of. “A well designed database system will mirror its users’ perceptions of the problem space, and thus allows them to address the problem in hand without complexities and distractions of computer world implementation details… Implicit is the notion that users should work within the bounds of ‘good practice’”

  25. Setting the scene – the database approach and state of the art. The semantic gap The gap between what you wish to represent and what you can represent. Customer C# Name … C1 Dr. Eaglestone C2 Ms Smith Salesperson Customer Salesperson 1 1 Placed_by Take_by SP# Names … S5 Mr. Chan … S8 Dr. Shao n m SalesOrder Sales_Order C# SP# Product Quantity C1 S5 P99 120 C1 S5 P2 10

  26. Principles of database technology… ….. & Data Independence Applications/Users External Model Logical Model Internal Model

  27. QT graphs molsworth "us army"

  28. A Ready-madeTemporal data modelling solution

  29. GENREG – A ready-made solution that has also been proposed for healthcare ? • The Organisation: National Museum of Denmark • Multimedia • Pictures as well as descriptions • Distributed • Each department ran their own database system for their collection (ownership!) • Object-oriented design • Entities, not just values • Relational implementation

  30. Database Research Praxis Application Technology Science Theory

  31. Topology Danish Pre-history Ethnographic Department LAN 1,000,000 artefacts 200,000 images Department of Antiquity Coin Collection

  32. Design / Abstractions • Design • Object oriented • Based on a curator’s perspective • “Curators apply scientific training to determine the history of artefacts…creating knowledge about past and present societies by determining relationships which group artefacts within certain times and places in history” • Abstractions • Artefact • Event • Relationship • relate artefacts which participate in common events

  33. Mould used to fabricate Brooches

  34. GENREG data model EVENT/ARTIFACT ARTIFACT One (or more) artifacts participates in one or more events.

  35. Burial site Grave Grave Artefact Artefact Artefact Artefact Artefact Artefact

  36. Manor House A B C D Merchant’s House E Furniture Purchase event G Rooms F H I J K L Furniture

  37. Integrated Care Pathways Application[Procter, P., Eaglestone, B.M. & Burdis, C. “A unified model to support an information intensive healthcare environment, MIE '99] Past P1 It Treatment Present P2 It+1 It+1 Alternative diagnoses P3 P6 It+2 Alternative prognoses It+2 P4 P5 Future(s)

  38. A formal GENREG Model type Genreg = abs [tuple[ Collection : Artifacts, Events : set[Event]] new : ()  Genreg, = : (Genreg × Genreg) boolean, events : (Genreg) set[Event], collection : (Genreg)  Artifacts] type Artifacts = graph[Artifact] type Event = abs[ tuple [id: E_Id, type : Exent_type, t : Time, place : Location, actors : set[Actor_Type], edge : set[Edge]] = : (Event × Event) boolean, id : (Event)  E_Id, type : (Event)  Event_Type, t : (Event)  Time, place : (Event)  Location, actors : (Event)  set[Actor_Type], edgeset : (Event)  set[Edges]] …

  39. type Time = abs[tuple[ lower, upper: T] new : () Time, = : (Time×Time) boolean, before : (Time×Time) boolean, meets : (Time×Time) boolean, overlaps : (Time×Time) boolean, during : (Time×Time) boolean, starts : (Time×Time) boolean, finishes : (Time×Time) boolean,

  40. add_artifact / delete_artifact (D, a) • add_event / delete_event (D, e) • merge (D,F,E) • select_artefacts (D,p) • select_events (D,p) • related_to (D,n) • related_by (D,e,n)

  41. Temporal Data Models(See also SQL/Temporal) Entity: Barry; Height: 2’ 3’’ Time: 1950 Attribute Time Entity: Barry; Height: 5’ 10’’ Time: 2004 Entity

  42. Artefact histories are created retrospectively • Multiple orthogonal time dimensions can be represented (using specialised events), e.g., discovery and historic time. • Relationships between events and states are modelled. • Multiple objects can represent different states and interpretations of an entity.

  43. QT graphs Q4 Q3 molsworth QJt "us army"

  44. Some final thoughts…

  45. Some final thoughts… • The Database Approach? • Semantic gap? • Data independence? • Temporal modelling? • Query language? • So, what’s happening?

  46. IR & DB? Problem-related Query Server(s) Internet accessible repositories of artefacts Client(s) User are researchers who derive knowledge from retrieved artefacts Problem-relevant artefacts Researcher’s workspace – Developed to model the Problem space Artefact collection IR – collections of artefacts are available for ad hoc querying (any relevant problem) – The problem is modelled by the query DB – collections of artefacts are structured to model the problem space.

  47. …final thoughts… • Knowledge of research methodology is important (qualitative and quantitative) • Nudist, Atlas, SPSS don’t support mixed methods • Database approach allows integration of qualitative and quantitative data, and organisation of data to evolve to model emerging theory • Temporal data models are key to modelling evolving strategy…

  48. Acknowledgments • The project team – Nigel Ford, Andrew Madden,Martin Whittle • Arts and Humanities Research Council (formerly Board) for funding • Mark Sanderson and Amanda Spink for making the Excite logs available • Val Gillet and Eleanor Gardiner for help with graphs.

More Related