Visualization Taxonomies and Techniques Text: Words, phrases, sentences, …

Visualization Taxonomies and TechniquesText: Words, phrases, sentences, … University of Texas – Pan American CSCI 6361, Spring 2014

Introduction • Text is ubiquitous • Documents, and more generally text, are a primary information source • (Verbal has its place!) • Access to documents and text has grown exponentially in recent years due to networking infrastructure • WWW • Digital libraries • Social media • Visualization to aid users in understanding and gathering information from text and document collections

Introduction From Stasko, 2013 • Visualization can aid in performing tasks • For example: • Which documents contain text on topic XYZ? • Which documents are of interest to me? • Are there other documents that are similar to this one (so they are worthwhile)? • How are different words used in a document or a document collection? • What are the main themes and ideas in a document or a collection? • Which documents have an angry tone? • How are certain words or themes distributed through a document? • Identify “hidden” messages or stories in this document collection. • How does one set of documents differ from another set? • Quickly gain an understanding of a document or collection in order to subsequently do XYZ. • Understand the history of changes in a document. • Find connections between documents.

IntroductionChallenges of Text Visualization • Text is unlike other data types seen so far, for example • Context and Semantics • Context relevant to understanding and meaning • Indeed, natural language understanding a challenge of the nth + 1 century • Dimensionality • Inherently, “not dimensional”, so must create “visually realizable” visual encoding • Often, first step is n-D, then 2- or 3-D • Modeling Abstraction • Consider level of “understanding” require for task • Match analysis task with appropriate tools and models

IntroductionRelated topics • Information Retrieval • Active search process that brings back particular/specific items (will discuss that some today, but not always focus) • InfoVis and HCI can help some… • Visualization may be most useful when not sure precisely what you’re looking for when retrieving information • More of a browsing paradigm than a search one • But, this is part of the information retrieval task • Define information need, formulate “query”, examine/evaluate results, … repeat • Sensemaking • Gaining better understanding of facts at hand in order to take some next steps • A principle focus in visualanalytics • Visualization can help make large document collection more understandable more rapidly • Which is good: “Overview, zoom and filter, details on demand”

User - Task Raw Information Visual Form Dataset Views Visual Mappings View Transformations Data Transformations Recall, Visualization Pipeline: Visualization Stages • Data transformations: • Map raw data (idiosynchratic form) into data tables (relational descriptions including metatags) • Text is nominal data • A word, or any text unit, does not map easily to any quantitative representation! • The “Raw data --> Data Table” mapping is a principle element of creating any visual representation • How do you get numbers from words, sentences, …?? • Will see several solutions F -1 F Visual Perception Interaction

User - Task Raw Information Visual Form Dataset Views Visual Mappings View Transformations Data Transformations Recall, Visualization Pipeline: Visualization Stages • Visual Mappings: • Transform data tables into visual structures that combine spatial substrates, marks, and graphical properties • And … visual mappings, as well, requires at least “the usual level” of creativity F -1 F Visual Perception Interaction

Understanding Text Content

Understanding Text Content • Visual representations of words, phrases, and sentences • Main goal of understanding, versus search • Visual presentation always part of text presentation – • Standard typography uses layout, font, style, color … • Electronic media, especially – pick a web page • “Single text content”

Single Text ContentWord Counts • 2012 National Conventions • NY Times: http://www.nytimes.com/interactive/2012/08/28/us/politics/convention-word-counts.html

Tag / Word Clouds

Tag / Word Clouds • Lots of popular interest • E.g., on web • Idea is to show word/concept importance through visual means • Tags: User-specified metadata (descriptors) about something • Sometimes generalized to just reflect word frequencies • Not a new technique • Milgram’s ‘76 experiment to have people label landmarks in Paris • Flanagan’s ‘97 “Search referral Zeitgeist” • Fortune’s ‘01 Money Makes the World Go Round

Tag / Word CloudsExample: US State of the Union Speeches • Guardian • http://www.guardian.co.uk/news/datablog/2011/jan/25/state-of-the-union-text-obama# • http://image.guardian.co.uk/sys-files/Guardian/documents/2011/01/26/State_of_the_union_2011.pdf?guni=Graphic:in body link

Flickr Tag Cloud

delicious Tag Cloud

Alternate Order

Many Eyes Tag Cloud Word pairs

Wordle

Wordle“Beautiful Word Clouds”, http://www.wordle.net/ • Tightly packed words • Horizontal, vertical or diagonal • Size correlated with frequency • Multiple color palettes • User gets some control • Layout Algorithm • Details not published • Sort words by weight, decreasing order for each word • Init position randomly chosen according to distribution for target shape • Update position moves out radially

Wordle“Beautiful Word Clouds”, http://www.wordle.net/ Course schedule, table of topics, and assignments

Can be many variations … A bit more order Order the words more by frequency

Mani-WordleUser control • Mani-Wordle • Start with nice default algorithm • Give user more control over design • Alter color (within a palette) • Pin words, redo the rest • Move and rotate words • http://www.cg.tuwien.ac.at/courses/InfoVis/HallOfFame/2012/Gruppe03/Homepage/index.html • Koh et al TVCG (InfoVis) ‘10

Tag / Word CloudsConclusions • Weaknesses • Sub-optimal visual encoding (size vs. position) • Inaccurate size encoding (long words are bigger) • Font sizes are hard to compare • May not facilitate comparison (unstable layout) • Word frequency may not be meaningful • Most use words vs. stems • Does not show structure of the text • Studies have even shown they underperform (Gruen et al CHI ’06) • Why so popular? • OK for “quick look” • Serve as social signifiers that provide a friendly atmosphere that provide a point of entry into a complex site • Act as individual and group mirrors • Fun, not business-like

BTW - Text Analysis Toolsvoyeur: http://voyeurtools.org/ Book + tools for text analysis and visualization

BTW - Text Analysis Toolsvoyeur: http://voyeurtools.org/ .

Visualization and Information Retrieval • Examples so far have focused on representing a single document • …, or, really, set of words as no consideration of even word order, let alone sentence structure, etc. • Principle question is how might visual representations aid text, or document, search • I.e., how to find the proverbial needle in a haystack, where the haystack is all the documents on the www or a digital library • Term information retrieval refers to this search and its history antedates computers • IR entails: • Determine information need • Query formulation • Retrieval • Assessment of results • Reformulation of query or even information need • Repeat (until information need met)

Visualization and Information Retrieval … • IR entails: • Determine information need • Query formulation • Retrieval • Assessment of results • Reformulation of query or even information need • Repeat (until information need met) • Provide visual representations that during this process • Document collection visually, support browsing, … • Even for determining information need! • Show query results visually • Show how query terms relate to results • … any aspect

Visualization and Information Retrieval From Stasko, 2013 • Provide visual representations that during this process • Document collection visually, support browsing, … • Even for determining information need! • Show query results visually • Show how query terms relate to results • … any aspect

Evaluating Query ResultsTileBars, Hearst, 1996 • Hearst points out that query responses do not include: • How strong the match is • How frequent each term is • How each term is distributed in the document • Overlap between terms • Length of document • Document ranking is opaque • Inability to compare between results • Input limits term relationships

TileBarsOverview From Stasko, 2013 • Goal : Minimize time and effort for deciding which documents to view in detail • Show the role of the query terms in the retrieved documents, making use of document structure • Graphical representation of term distribution and overlap • Simultaneously indicate: • Relative document length • Frequency of term sets in document • Distribution of term sets with respect to the document and each other

TileBarsScreen From Stasko, 2013 TileBars screen:

TileBarsDocument representation From Stasko, 2013 Visual representation of retrieved documents Video: TileBars-80mb-chi96_05_m1.mpeg

TileBars TileBars Video

TileBarsConclusions • Clearly visually provides the information intended about each document • Ease/effort/time of comparison? • Surely would improve with use • … ?

Evaluating Query ResultsSparkler From Stasko, 2013 • Abstract result documents more • Havre et al InfoVis ‘01 • Show “distance” from query in order to give user better feel for quality of match(es) • Also shows documents in responses to multiple queries • Visualizing One Query • Triangle – query • Square – document • Distance between query and documents represents their relevance

Sparkler From Stasko, 2013 Visualizing Multiple Queries Six queries here Bullseye allows viewer to select quality results

Sparkler From Stasko, 2013 Test Example Text Retrieval Conference (TREC-3) test document collection AP news stories from June 24–30, 1990 TREC topic: Japan Protectionist Measures Sparkler found 16 of 17 relevant documents

Evaluating Query ResultsRankSpiral From Stasko, 2013 • Compare search results from different search engines • SpoerriInfoVis ’04 poster

RankSpiral From Stasko, 2013 Color represents different search engines Compare search results from different search engines

RankSpiral Color represents different search engines Compare search results from different search engines

Evaluating Query Results ResultMaps From Stasko, 2013 • Treemap-style vis for showing query results in a digital library • Clarkson, Desai & Foley TVCG (InfoVis) ‘09

Representing Multiple Documents

Representing Multiple Documents • Previously, have seen various techniques for comparing multiple documents that are results of query, i.e., a subset of all documents • Also, may want to just show everything, and then let user do “manual search”, or user-directed search • Such displays of all documents also support the type of search common in visual analytics • Query, browse, connect, drill-down • Will see: • Parallel word clouds • Tree layout of synonyms • …

Multiple DocumentsParallel Tags Clouds • Tag clouds increase size of word as f(frequency) • Showing multiple documents as tag clouds allows visual inspection • Automated and user directed, visual analytics • Parallel Tag Clouds - name says it all • Video - Collins et al VAST ‘09 – different circuit courts • http://www.youtube.com/watch?v=rL3Ga6xBgLw

Multiple DocumentsDo different district courts differ in cases they handle? .

Multiple DocumentsCounting Words: Overview & Timeline • Ex., across speeches can count words • State of the Union Addresses • http://www.nytimes.com/ref/washington/20070123_STATEOFUNION.html?initialWord=iraq • NY Times demo

Visualization Taxonomies and Techniques Text: Words, phrases, sentences, …

Visualization Taxonomies and Techniques Text: Words, phrases, sentences, …

Presentation Transcript

words and phrases

Transitional words and phrases

Combining Sentences and Inserting Phrases

WORDS + PHRASES

Linking words and phrases

French Words and Phrases

Foreign Words/Phrases

Colonial Words and Phrases

Words and Phrases That Should

Transitional Words and Phrases

Sentences, Phrases and Clauses…

Text Visualization

Words, Sentences, Paragraphs

Transition Words and Phrases

TRANSITIONAL WORDS AND PHRASES

Important Words and Phrases

Commonly Misused Words and Phrases

Frases / oraciones phrases / sentences

Foreign Words and Phrases

Foreign Words and Phrases

Transitional Words and Phrases

SENTENCES AND PHRASES FOR REPEATED READING