1 / 17

Prefix Tag Clouds

Prefix Tag Clouds. Michael Burch, Steffen Lohmann, Daniel Pompe, Daniel Weiskopf. BY: Farah Kamw. Introduction. Tag clouds are a popular way to visually represent word frequencies.

Télécharger la présentation

Prefix Tag Clouds

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prefix Tag Clouds Michael Burch, Steffen Lohmann, Daniel Pompe, Daniel Weiskopf BY: Farah Kamw

  2. Introduction • Tag clouds are a popular way to visually represent word frequencies. • A major limitation of tag clouds is that they treat different forms of the same word as individual tags. • For example: singular and plural forms of the same word are treated as individual tags .

  3. Introduction(cont.) We have two cases: • In the first case, all variations of words appear in the tag cloud .In this case, they take up screen space that could better be used to display other information or additional tags. • In the second case, Such variations are not shown at tag cloud because they are not frequent enough. In this case, the viewer has no indication whether there are other forms of a word in the text or not.

  4. Introduction(cont.) • To overcome this limitation, they have developed prefix tag cloud. • It creates a prefix tree that groups different word forms and visualizes the subtrees as tag cloud. • The color is used to emphasize the grouping, while the relative frequencies of the word are indicated by font size. • The grouped word forms are arranged in a circular tag cloud layout that supports the quick identification of the most frequent words.

  5. CREATION OF PREFIX TAG CLOUDS The process of creating prefix tag clouds consists of three main components: • Aprefix tree is generated from a set of tags. This requires to order the tags lexicographically. • The prefix subtrees are rendered as node-linkdiagrams. The font sizes of the prefixes in the diagrams are scaled according to the tag frequencies. • The tag cloud is composed from the subtrees and placed in a given drawing area.

  6. Prefix Tree Generation • Initial input is a set of tags T={t1,t2,…,tn} with individual tags . The tags are composed of a finite sequence of characters from the alphabet. • Each tag ti contains at least one character and is not an empty string.

  7. Prefix Tree Generation(cont.) Algorithm 1 provides the pseudo code for that function which consists of the following steps: • Adding the empty string λ to T: It serve as prefix for all tages. • Ordering T lexicographically: Fast sorting algorithms such as merge sort can do this in O(n log n) time. As a result, we obtain the lexicographically ordered list T with the empty string λ as first element. • Generating the prefix tree from Tlex

  8. Subtree Rendering • Before generating the prefix tag cloud, they split the prefix tree P at its root node vrootinto a set of subtrees. • The subtrees of are visualized as node-link diagrams a left-to-right orientation. Parent nodes are placed in the vertical center and to the left of their child nodes. • The font sizes of the prefixes in the subtrees are scale according to the frequency values. • Since a linear change of The font size has a quadratic effect on the text area , they use the square root of the frequency values for scaling.

  9. Tag Cloud Generation • The subtrees are visualized in a circular tag cloud layout with the most frequent tags in the center and tags with decreasing frequencies towards the boundary. • Drawing starts with the subtreethat contains the tag with the highest frequency value, This first subtreeis placed in the center of the drawing area. • Subtreeswith decreasing frequencies are placed along the spiral path.

  10. Tag Cloud Generation(cont.) Subtree drawing continues until one of three conditions is met: 1. All subtrees of P have been rendered, 2. A user defined number of subtrees of P has been rendered. 3. The drawing area is completely filled with subtrees. The first two conditions result in a circle-shaped tag cloud due to the spiral placement of the subtrees. The third condition results in a tag cloud with the shape of the drawing area.

  11. APPLICATION EXAMPLE • The investigated text corpus is a large dataset containing more than 2.1 million publications in the field of computer science. • They transformed the publication titles into individual tags. First they remove special characters and separate words by spaces. We then converted the tags to lowercase and removed stop words. • Finally, they counted the frequencies of each tag.

  12. Conclusion • We have introduced the prefix tag cloud that makes use of prefix trees. Different word forms are visually grouped by color and space, facilitating their identification and comparison in the tag cloud. • The left-to-right orientation of the prefix subtrees leads to a well readabletag cloud layout. • This method has limitation; for example, words may be grouped that are not related (like ”gene” and ”general”).

  13. Thank You

More Related