Architecture for graphical maps of Web contents

Architecture for graphical maps of Web contents Krzysztof Ciesielski, Michal Draminski, Mieczyslaw Klopotek, Mariusz Kujawiak, Slawomir Wierzchon Institute of Computer Science, PAS, Warsaw University of Podlasie, Siedlce Białystok University of Technology

Agenda • Motivation • Architecture • Map interface • Map creation • Map clustering • Execution time of map creation • Convergence of map creation • Future direction

Motivation • the Web and also intranets become increasingly content-rich • a good way of presenting massive document sets in an understandable way will be crucial in the near future. • The BEATCA project envisages creation of a user-friendly content presentation of moderate size document collections (with millions of documents).

Our approach • The presentation method is based on the WebSOM's map idea and is enriched with novel methods of document analysis, clustering and visualization. • A special architecture has been elaborated to enable experiments with various brands of map creation algorithm. • Our research targets at creation of a full-fledged search engine (with working name Beatca) for small collections of documents capable of representing on-line replies to queries in graphical form on a document map.

Architecture • We follow the general architecture for search engines, • the preparation of documents for retrieval is done by an indexer, which turns the HTML etc. representation of a document into a vector-space model representation, • the map creator is applied, turning the vector-space representation into a form appropriate for on-the-fly map generation, • Maps are used bythe query processor responding to user's queries.

Architecture .................. Base Registry Search Engine Indexer Optimizer Mapper Vector Base Robot Map HT Base Indexer Mapper Optimizer Vector Base Map .................. .................. .................. HT Base

User interface • Search results are presented on a document map • The map can have one of two forms: • The traditional flat map • The rotating torus

Rotating torus representation of the map

How are the maps created • A modified WebSOM method is used • Based on our observation of radical reduction of document vector variation • Multi-level maps

A map for 20 newsgroups

A detailed map for Syskill&Webert 4 document groups

A high level map for Syskill&Webert 4 document groups

Clustering groups documents • A fuzzy isodata method used • Entropy based • Initialisation with Minimum weight spanning tree • Clustered documents are labeled by weighed centroids of cell reference vectors modified with entropy

Approximate clustering using minimal spanning tree for 5 newsgroups

Label candi-datesfor clusters(5 news-groups)

Experiments with execution time The impact of the following factors on the speed o9f map creation was investigated: • Map size • Optimization method • Dictionary optimization (extreme entropy and extreme frequency) • Reference vector optimization

Convergence We checked the convergence of the maps to a stable state depending on • Type of alpha function (search radius reduction) • Type of winner search method

Future research • We intend to integrate Bayesian and immune system methodologies with WebSOM in order to achieve new clustering effects. • Bayesian networks will be applied in particular to classify documents, to accelerate document clustering processes, to construct a thesaurus supporting query enrichment, and to keyword extraction. • Immuno-genetic systems will be used for adaptive document clustering by referring to the mechanism of so-called metadynamics, for extraction of compact characteristics of document groups by exploitation of the mechanism of construction of universal and specialized antibodies , and for visualisation and adjustment of resolution of document maps.

Thank you • Any questions?

Architecture for graphical maps of Web contents

Architecture for graphical maps of Web contents

Presentation Transcript

Web Architecture

Architecture Web

Effective Graphical Web Development

Sigma: Towards a Graphical Architecture for Integrated Cognition

Graphical Interface Design for the Web

Graphical Interface Design for the Web

Graphical Interface Design for the Web

Graphical Interface Design for the Web

Web Architecture

Generic Graphical User Interfaces for the WEB ___________

Architecture of the web

Web architecture

Graphical Representation of Climate Model Architecture

Web Architecture

Preparing for Viral Web maps

Architecture of Web design

Web Architecture

Characteristics of Graphical and Web User Interfaces

Google Maps Web Scraping