1 / 26

STREAMIT: Dynamic Visualization and Interactive Exploration of Text Streams

STREAMIT: Dynamic Visualization and Interactive Exploration of Text Streams. Text Stream. Textual Data Explosion Emails, news, messages, broadcasts, … Daily, hourly, minutely Urgent need for efficient processing and analysis Visualization is an effective approach Text stream

charis
Télécharger la présentation

STREAMIT: Dynamic Visualization and Interactive Exploration of Text Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STREAMIT: Dynamic Visualization and Interactive Exploration of Text Streams

  2. Text Stream • Textual Data Explosion • Emails, news, messages, broadcasts, … • Daily, hourly, minutely • Urgent need for efficient processing and analysis • Visualization is an effective approach • Text stream • Text collections constantly evolve with continuously new incoming documents • Keywords/topics not known in advance

  3. Challenges to Visual Exploration • Temporal evolution • Existing topics • Emerging topics • Their relations • Clusters and Outliers • No collection pre-scanning or presumably priori knowledge • Live processing required • In contrast to traditional text database • Flexible user interaction for changing and adjusting • information seeking focus/preference • Process large volumes of texts in real time

  4. SREAMIT System • Dynamic force-directed simulation • Naturally handle continuously inserted documents • Continual evolvement • Continuous depiction and analysis of growing document collections • Automatic grouping and separating • No time window used • No abrupt change • Dynamic processing • Keyword vectors dynamically updated • No prerecorded scan

  5. SREAMIT System (continued) • Interactive exploration • Live adjustment of visualization parameters • Dynamic keyword importance • Present the significance of a keyword at a certain time • Reflect changing user demand and interest • Scalable optimization • Fast computing • GPU acceleration • Animation and interaction • Easy user control and interaction tools

  6. Related Work • Multidimensional scaling (MDS) & projection : • IN-SPIRE 99, InfoSky 02, Hipp 08, Exemplar-based 09 • Temporal data trends • ThemeRiver 02, LensRiver 07, T-scroll 07, Meme-tracking 09, Themail 06, Topic-based 09 • Text streams • TextPool 05, Moving time window Wong03, Eventriver 10, Text pipe 05 • Force-based placement • Graph drawing 91, Chalmers96, Morrison02, etc.

  7. System Overview

  8. Potential and Similarity • Potential energy between pairs of document particles • α is a control parameter • liand ljarelocations of particle iandj • lij is the ideal distance of them • Ideal distance computed from document similarity • Cosine similarity • Large similarity leads to smaller ideal distance, move documents closer to form clusters

  9. Force-directed Model • Global potential function • Forces computed from minimization • Attract or repulse document particles

  10. DYNAMICKEYWORD IMPORTANCE • Cosine similarity can be improved by introducing importance • ImportanceIk freely modified by users at any time • According to interest/preference • According to discovered knowledge from prior period • A powerful tool for users to manipulate layout and analyze data • Importance might be changed from automatic scheme • E.g. for keyword k, • Ok: occurance; • tek:last time it appears; tsk: first time it appears; • nk: the number of documents that contain the keyword

  11. Visualization Interface

  12. Visualization Tools • Main window • Major layout • Animation Control Panel • Play, pause, stop • Drag by mouse • Keyword table • Dynamic update • Change importance • Document table • Text information

  13. Labeling • Use text document titles • Reduce cluttering • Recent semantic titles • User controlled clutter levels • Group title label • Use color and opacity to display clear layout

  14. User Interaction • Adjusting Keyword Importance • Grouping and Tracking Documents • Halo for interested topics • Browsing and Tracking Keywords • Selection • Manual, example-based, keyword-based • Integrated shoebox for details

  15. Case Study: New York Times News • Total article number: 230 • Time period Jul. 19 and Sep. 18, 2010 • About Barack Obama • Articles continuously injected, new keywords added to the keyword table, and their frequencies are updated on-the-fly • Keyword importance automatically assigned

  16. Case Study: New York Times News 136 news articles High frequency keywords: “Politics and Government”, “International Relations”, “Terrorism” Increase the importance of “International Relations” All documents are shown “Terrorism” becomes larger, and one item (outlier) between “Afghanistan War” and “Terrorism” Highlight the group with “Afghanistan War” in pink halo (2) “Terrorism” in orange halo (3)

  17. Case Study: US NSF Award Abstracts • 1000 National Science Foundation (NSF) IIS award abstracts • Funded between Mar. 2000 and Aug. 2003 • Each document characterized by a set of keywords • Size of a document circle represents funding amount

  18. Case Study: US NSF Award Abstracts Mar. 15, 2002,672 projects; many large projects started; Highlight “Sensor” with halo; (2) is an outlier far away from the other projects with halo It is about “just-in-time information retrieval on wearable computers” Aug. 1, 2000 95 projects Sep. 1, 2000,172 projects; many large projects started; Highlight “Management” in red and “Database” in green; Increase their importance

  19. Case Study: Video on NSF Dataset

  20. Case Study: Video on NSF Dataset

  21. PerformanceOptimization • Initial positions of document particles affect computational steps and cost • Similarity Grid • New documents roughly inserted within the proximity of similar documents • Each grid cell has a special keyword vector consisting of the average keyword weights from the documents inside the cell • data set of 7100 documents

  22. PerformanceOptimization • GPU acceleration • CUDA implementation of the N-body problem • Good performance achieved • NVidia Quadro NVS 295 GPU with 2GB texture memory • Intel Core2 1.8GHz CPU with 2GB RAM

  23. GPU Performance • Experiments with 50 by 50 grid • Achieve good average speed • More importantly, maximum simulation time after document insertion on the GPU was less than a second • Fast for human perception and analysis

  24. Discussion • The system has the ability to handle live text streams with document arrival interval around 1 second • On consumer PC and graphic card • E.g., New York Times news has an averaging 3 documents per hour and a maximum 8 documents per hour at the peak time • A very large number of documents inside the system will undoubtedly introduce visual clutters and hinder the ingestion of analyzers • Natural perception limit and device limit • Clutter reduction and simplification algorithms needed • Further increase the power • Advanced hardware • Hierarchical or multiple-resolution simulation

  25. Conclusion • STREAMIT: An efficient visual exploration system for live text streams • Dynamic physical system • Keyword manipulation with importance • Visual tools • Acknowledgment: • National Science Foundation IIS-0915528, IIS-0916131 and NSFDACS10P1309.

  26. Thanks! Questions!

More Related