110 likes | 124 Vues
Explore XML document structure for enhanced search. Information retrieval with meta-schema, visual clustering, & multidimensional scaling. Develop front end for XML database.
E N D
Project Update XML Document Visualization and Retrieval • Matt Williams
Background • Can we take advantage of this structure when searching for documents? • XML vs Web Doc • Added Structure <book> <title>My First XML</title> <prod id="33-657“ media="paper"> </prod> <chapter>Introduction to XML <para>What is HTML</para> <para>What is XML</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag</para> <para>Elements must be properly nested</para> </chapter> </book>
Information Retrieval • Standard Information Retrieval (IR) • tf*idf • tf – frequency of a term in a doc • Idf – inverse document frequency • Number of documents containing the term
Information Retrieval • A fair bit of previous work on adding structure to IR queries. • Examples • XIRQL – Fuhr and GroBjohann • //book/chapter[heading $cw$ “InfoVis”] • XXL – Theobald and Weikum • Select Z From Index • Where zoos.~animal.~cougar as Z But… • What if we are unsure of the structure? • What if we have variability in the structure?
Information Retrieval • My goal is to provide an interface to explore the XML collection with limited information • Meta-Schema Information – Element Index • Visual Clustering – Multidimensional Scaling • Visual Queries – Element Selection
Related Work • Visual Information Seeking • Homefinder / Periodic Table – Algerg and Shneiderman
Related Work • Galaxies Wise et al. • Visual Web Retrieval • Lighthouse - Leuski
Related Work • ZUI – Pad, Jazz, and Piccolo • Ben Bederson • SpaceTree • Jesse Grosjean et al. • TreeMaps ?? • Ben Shneiderman
Multidimensional Scaling • Document Similarity • Dimensionality Reduction From full dimensional distance measure 2 dimensional distance measure • Problems – Speed?
Test Environment • eXist – Open Source XML Native Database • Wolfgang M. Meier • http://exist-db.org/ • I am working on providing a front end to the Database that provides: • A Selectable Element Index • Interactive Results That Dynamically Cluster and Zoom
Thus Far • Lots of Learning!! • XML Databases • Multidimensional Scaling • XML Queries • XML Information Retrieval • Zoomable Interfaces • Treemaps • Added basic GUI to eXist • Added a Service to offer the element Index as part of the API