1 / 18

Understanding Text Corpora with Multiple Facets

Understanding Text Corpora with Multiple Facets. Lei Shi, Furu Wei, Shixia Liu, Xiaoxiao Lian, Li Tan and Michelle X. Zhou IBM Research. Emergency Room Records. Hotel Reviews. Intelligence Reports. Email Documents. Financial News/Blogs/Message Boards. Outline. Problem & Related Work

gregdavis
Télécharger la présentation

Understanding Text Corpora with Multiple Facets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding Text Corpora with Multiple Facets Lei Shi, Furu Wei, Shixia Liu, Xiaoxiao Lian, Li Tan and Michelle X. Zhou IBM Research

  2. Emergency Room Records

  3. Hotel Reviews

  4. Intelligence Reports

  5. Email Documents

  6. Financial News/Blogs/Message Boards

  7. Outline • Problem & Related Work • Multi-Facet Text Data Model and Text Processing • Data model • Text pre-processing • Content summarization • Visualization • Metaphor • Creation algorithm • Interactions • Video Demo

  8. Problem & Related Work • It’s challenging to build a visual analytics tool to explain multi-faceted text corpora! • How to combine the raw text data with rich text analytics result for visualization? • What visual metaphors to apply to effectively illustrate text content, evolution and facet correlations? • How to customize interactions to assist user in data navigation and other visual analytics task? • Related work • Text trend visualization • ThemeRiver, NameVoyager, etc. • Text content visualization • Tag cloud, Wordle, PhraseNet, etc. • Text entity pattern visualization • TileBars, Jigsaw, FeatureLens, Takmi, etc. • Text visualization in specific domains • Themail@email, TileBars@search,

  9. Multi-Facet Data Model and Text Pre-Processing • Multi-Facet Data Model for Text Corpora -- • Time Facet • Explicit field or extracted from raw text • Category Facet • Topic modeling by Latent Dirichlet Allocation (LDA, Blei et al. 2003) • Category labels from document classification/clustering • Leverage other nominal structured information (hotelnames, countries, etc.) • Unstructured (Content) Facets • Inherent multiple text fields • Multiple facets from NE extraction (people, location, organization) or POS parsing (Noun, Verbs, Adjective) • Structured Facets • Categorical, numericalor nominal data fields • Other calculated categorical value (sentiment orientations, average ratings)

  10. Content Facet Summarization kth document in the collection {…, P(Ti | Dk), …} A set of topic probabilities A set of topics {T1, …Ti,… TN } A set of keywords Rank the topics to present most valuable ones first {W1, …, Wj, …, WM} A set of word probabilities Select keyword sub-set for each time segment for content summary {…, P(Wj | Ti), …} {…}t-1, {…, Wj, …}t, {…} t+1,

  11. Content Facet Summarization • Topic/category re-ranking by topic coverage and variance: find the most active topic with significant variety • Topic coverage: • Topic variance: • Balancing two metrics: • Keyword re-ranking • Topic keyword re-ranking: • Time-sensitive keyword re-ranking: preserve completeness and distinctiveness • Completeness: cover the original keywords of a topic • Distinctiveness: distinguish one time segment from another Doc-topic dist. Doc length Doc no. Topic-keyword distribution Topic number

  12. Text collection Text Preprocessing Text content + meta data Text Summarization Summarization results Visualization User Interaction System Architecture

  13. Unstructured Facets Category Facet Structured Facets Time Visualization Metaphors • Multi-stack trend visualization + Time-sensitive tag clouds • Vis-data mappings: time facet – x (time) axis, category facet – stack, unstructured facets – tag clouds, structured facet – keyword style (color/font) • Other mappings: document count – y axis, re-ranked occurrence count -- keyword size

  14. Keywords Layout • Keyword layout with the sweep-line greedy algorithm

  15. Interactions • Temporal zooming for time facet navigation • Topic editing for category facet navigation • Unstructured facet navigation panel • Structured facet mapping • Other customized interactions: topic focus-in-context view

  16. Focus-In-Context View Calculation • Constraints for detailed trend view • Contour-preserving • Flexible space control • All topic trends as undistorted as possible • 1D fisheye distortion • Height calculation for expanded trend • Order-preserving height adjustment • Apply fisheye distortion from the center line of selected topic

  17. Video Demo • Visual Analytics for Emergency Room Record

  18. Simplified Chinese Thai Korean Traditional Chinese Gracias Thank You Spanish Russian English Obrigado Brazilian Portuguese Arabic Danke German Grazie Merci Italian French Japanese Hindi Tamil 18

More Related