1 / 22

Question-Answering on Yahoo!Answers: Preliminary Results

Question-Answering on Yahoo!Answers: Preliminary Results. Rong Tang Sheila Denn OCLC/ALISE LIS Research Grant Presentation ALISE 2009 January 23, 2009. Background. Yahoo!Answers Social Q&A 25+ pre-defined categories Users post questions, answer questions, rate answers, provide comments

treva
Télécharger la présentation

Question-Answering on Yahoo!Answers: Preliminary Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Question-Answering on Yahoo!Answers: Preliminary Results Rong Tang Sheila Denn OCLC/ALISE LIS Research Grant Presentation ALISE 2009 January 23, 2009

  2. Background • Yahoo!Answers • Social Q&A • 25+ pre-defined categories • Users post questions, answer questions, rate answers, provide comments • One best answer chosen by the asker or through vote • Users may provide comments

  3. Rating/Voting/Commenting

  4. Our Research Project • Funded by OCLC/ALISE Grant Program and Simmons College President’s Fund for Research • Project Staff: • Rong Tang (PI) • Sheila Denn (Co-PI) • Sam Kalat (technology consultant, programmer) • Laura Saunders (Research Assistant) • The project wiki page documents the relevant literature and project progression, with extensive meeting notes on coding decisions

  5. Research Questions • Are existing question taxonomies (such as those in Graesser et al. (1994) and Freed (1994)) valid in a social Q&A environment? • What are the relationships between the linguistic characteristics, functional properties, and subject content of the questions and the kinds of responses that they receive? • What are the characteristics of answers that are chosen as “best” answers? • What is the role of the social function vs. the information function in social Q&A? • What are the implications of the above for provision of library and information services?

  6. Previous Research • Question classification • Wh- questions (Robinson & Rackstraw, 1972) • Conceptual question categories (Lehnert, 1978) • Content-based question categories (Graesser, et al., 1994) • Reference question classification (Pomerantz, 2005) • Questions in Dynamic Semantics (Aloni, Butler, & Dekker, 2007) • Answer classification • Much less research here than with question classification • Answer selection rules (Lehnert, 1978) • Criteria based on Yahoo!Answers comments (Kim et al., 2007)

  7. Previous Research (cont.) • Formal studies of Online Q&A • Answerers: “specialists” vs. “synthesists” (Gazan, 2006) • Questioners: “seekers” vs. “sloths” (Gazan, 2007) • Question purpose (Graesser, et al., 1994) • Filling knowledge gaps • Establishing and monitoring common ground • Coordinating social action • Directing the conversation and controlling attention

  8. Research Plan • Data collection and sampling • Gathered a stratified random sample of 3,000 question-answer sets, including any comments • Stratified by 25 top-level categories assigned by Yahoo!Answers • Data coding • Content analysis at multiple levels • Syntactic • Semantic • Pragmatic

  9. Research Plan (cont.) • Data Analysis • Descriptive statistics will be produced for: • Frequency of answers provided per question • Average length of time to first answer • Distribution of subject categories • Distribution of question and answer types • Distribution of chosen answer types • Correlation analysis will be performed for: • Linguistic characteristics of questions and answers • Functional categories of questions and answers • Subject categories of questions and answers

  10. Progress to Date • Sample has been collected • Preliminary coding has begun • Syntactic coding of questions is complete • Wh- questions • Inversion questions • Other questions • Multiparts • Double coding • Syntactic coding of question descriptions is complete • Number of questions included in description text • Type of questions

  11. Data Coding • Two coders perform coding individually then go over the coding to reach consensus on final coding of each question • Use of informal language presents a challenge for coding • Is it a question if it doesn’t include a question mark? Is it a question simply because it has a question mark in the end? • Should “WTF” be coded a “what” question or other question? Or not at all? • Coding multiparts of a question, eg., “Why do husbands feel they have to lie to other women about being married, and when the other woman finds out?” • Double coding questions such as "Is there anywhere you can listen to citizen band radio online?"

  12. Preliminary Results

  13. Number of Answers Per Question

  14. Length to Receive 1st Answer

  15. Wh-question frequency • “What” Questions

  16. Wh-question frequency • “Why” Questions

  17. Wh-question frequency • “How” Questions

  18. Wh-question frequency • “Inversion” Questions

  19. Next Steps • Start semantic and pragmatic analysis of questions • Start answer analysis • Start comment coding • Explore the association and features of Q and A and C • Develop a conceptual and analytical model for social Q&A

  20. Questions?

More Related