1 / 18

FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”. David Schuff (schuff@temple.edu) Temple University Ozgur Turetken (turetken@ryerson.ca) Ryerson University. The role of weblogs. Increasingly important mode of discourse Is this really the “new media”?.

ellis
Télécharger la présentation

FeedWiz: Using Automated Document Clustering to “Map the Blogosphere”

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FeedWiz: Using Automated Document Clustering to “Map the Blogosphere” David Schuff (schuff@temple.edu) Temple University Ozgur Turetken (turetken@ryerson.ca) Ryerson University

  2. The role of weblogs • Increasingly important mode of discourse • Is this really the “new media”?

  3. The consequences • Proliferation of information • Easy self-publishing • Proliferation of content • Leads to a “silo effect” • Limited information diet of only a few blogs • Will tend to seek out confirmatory points of view Our area of interest is news and political blogs. Not a blog about Paris Hilton (yes, there is one).

  4. The consequences • (Strict) filtering is seen as a threat to public discourse and democracy (Sunstein 2004) • At least, the true potential of the blogosphere is not being realized

  5. The power law distribution • An exponential relationship between two variables • Used to explain website popularity • On the right: number of inbound links by weblog (2002) The top 3% of the political blog sites accounted for 20% of the inbound links http://www.shirky.com/writings/powerlaw_weblog.html

  6. The decision support and information systems context • A key challenge is to create tools that help “filter, sort, and navigate” the blogosphere (Cayzer 2004) • Blogging is essentially a form of CMC (Tan et al. 2005) • Can facilitate “common understanding” • The formation of an opinion is essentially a decision-making issue

  7. Research question • How can information presentation techniques be used to improve information consumption on the blogosphere? • Our proposition: This can be done by presenting information organized by content, not by author (or site)

  8. What we’re drawing from • Chunking and semantic networks (Miller 1964, Mandler 1967, Quillian 1968, Collins and Quillian 1969) • Clustering of text-based documents(Chen et al. 1996, Chen et al. 1996, Pirolli et al. 1997, Spangler et al. 2003, Roussinov and Chen 2001, Turetken and Sharda 2004) • Information visualization • “Preattentive” extraction of information (Bray 1996) • Size and color (Shneiderman 1994)

  9. FeedWiz (demo) Browse the individual clusters Navigate clusters of blog entries Select/create a list of weblogs Live demo… How it works…

  10. Study 1 design • Quasi-experiment (semi-controlled) • Two groups of subjects • Both given a list of webogs • Group A: Given an ordered list of URLs • Group B: Given FeedWiz

  11. Measuring effectiveness • Study how attitudes change (OXO design) • Measuring… • Opinion (agree/disagree and supporting rationale) • Level of conviction • Sources (blogs) used to form the opinion OXO Ask subjects’ opinion on an issue (i.e., hybrid cars) Ask subjects again for their opinion on that issue Give subjects an hour to read the list of blogs

  12. Hypotheses H1: In forming their opinions, FeedWiz users will use more sources than those who use an ordered list H2: FeedWiz users will be more likely to change their opinions than those who use an ordered list H3: FeedWiz users are less likely to form strong opinions than those who use an ordered list

  13. Study 2 design • Intensive data collection with small sample • Tracking of eye-movements • Recording verbal comments • Protocol analysis • For further insights on usability of tool

  14. Expected contributions • Investigate how opinions are formed from blogs • Understand how information presentation techniques can influence information consumption • Implications for public discourse on the web • Creation of a highly usable tool which demonstrates those techniques

  15. References Bray, T. (1996). Measuring the Web, In Proceedings of the Fifth International World Wide Web Conference, Paris, France. Cayzer, S. (2004). Semantic blogging and decentralized knowledge management. Communications of the ACM, 47(12), 47-52. Chen, H., Nunamaker, J., Orwig, R.E., & Titkova, O. (1998). Information visualization for collaborative computing. IEEE Computer, 31(8), 75-82. Chen, H., Schuffels, C., & Orwig, R.E. (1996). Internet categorization and search: A self-organizing approach. Journal of Visual Communication and Image Representation, 7(1), 88-102. Collins, A.M. & Quillian, M.R. (1969). Retrieval time from semantic memory. Journal of Learning and Verbal Behavior, 8, 240-247. Mandler, G. (1967). Organization in memory. In K. W. Spence, & J. T. Spence (Eds.), The Psychology of Learning and Motivation (pp. 327-372). New York, NY: Academic Press. Miller, G.A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81-97. Pirolli, P. Schank, P., Hearst, M., & Diehl, C. (1996). Scatter/gather browsing communicates the topic structure of a very large text collection. In Proceedings of the Conference on Human Factors in Computing Systems, New York, NY: ACM Press, 213-220. Quillian, M.R. (1968). Semantic memory. In M. Minsky (Ed.), Semantic Information Processing (pp. 227-270), Cambridge, MA: The MIT Press.

  16. References (continued) Roussinov, D.G. & Chen, H. (2001). Information navigation on the web by clustering and summarizing query results. Information Processing and Management, 37(6), 789-817. Shirky, C. (2003). Power laws, weblogs, and inequality. Accessed September 26, 2006 from http://www.shirky.com/writings/powerlaw_weblog.html. Shneiderman, B. (1994). Dynamic queries for visual information seeing. IEEE Software, 11(6), 70. Spangler, S., Kreulen, J.T., & Lessler, J. (2003). Generating and browsing multiple taxonomies over a document collection. Journal of Management Information Systems, 19(4), 191-212. Sunstein, C.R. (2004). Democracy and filtering. Communications of the ACM, 47(12), 57-59. Tan, C., Goswami, S., Chan, Y., & Zhong, Y. (2005). Conceptual evaluation of weblog as a computer-mediated communication application. In Proceedings from the 11th Americas Conference on Information Systems, Omaha, NE, 2361-2367. Turetken, O. & Sharda, R. (2004). Development of a fisheye-based information search processing aid (FISPA) for managing information overload in the web environment. Decision Support Systems, 37(3), 415-434.

  17. Appendix: How FeedWiz Works

  18. Appendix: How the documents are clustered • Blog posts are saved as text files on the FeedWiz server • Those files are grouped into clusters based on similarity • An output file is generated that describes the hierarchy

More Related