1 / 24

NaLIX: A Generic Natural Language Search Environment for XML Data

NaLIX: A Generic Natural Language Search Environment for XML Data. Presented by: Erik Mathisen 02/12/2008. Authors. YUNYAO LI IBM Almaden Research Center HUAHAI YANG University at Albany, State University of New York H. V. JAGADISH University of Michigan. Overview.

boone
Télécharger la présentation

NaLIX: A Generic Natural Language Search Environment for XML Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NaLIX: A Generic Natural Language SearchEnvironment for XML Data Presented by: Erik Mathisen 02/12/2008

  2. Authors • YUNYAO LI • IBM Almaden Research Center • HUAHAI YANG • University at Albany, State University of New York • H. V. JAGADISH • University of Michigan

  3. Overview • Goal is to provide a generic natural language query interface to an XML database • Use English language as the natural language • Translate the natural language query into a meaningful query for a XML database

  4. Schema Free XQuery • Not necessary to map the query into the database schema • Target language of the translation from the natural language • Beginning with a given collection of keywords, each of which identifies a candidate XML element to relate to, the MQF (meaningful query focus) of these elements, if one exists, automatically finds relationships between these elements, if any, including additional related elements as appropriate

  5. Three Step Process • Token Classification • Parse tree validation • Translation into XQuery • These three key steps are independent of one another. • Improvements can be made to any one without impacting the other two.

  6. Token Classification • MINIPAR • State of the art dependency processor • Free, off of the shelf software • Identify words or phrases in the query that can be mapped to components of XQuery • Token – matches XQuery component • Marker – no matching XQuery component

  7. Tokens

  8. Markers

  9. Translation into XQuery • Parse tree must be valid (Step #2) • Any reasonably designed XML document should reflect certain semantic structure isomorphous to human conceptual structure, and hence expressible by human natural language • A natural language query may contain multiple name tokens, each corresponding to an element or an attribute in the database • Name tokens “related” to each other should be mapped into the same mqf function in Schema-Free XQuery and hence found in structurally related elements in the database • Elements/attributes specified in the same XQuery statement are related either via structural join or value join. • Determining grouping and nesting for aggregation functions is difficult, because the scope of an aggregation function is not always obvious from the token it directly attaches to.

  10. Interactive Query Formulation • Makes no attempt at superior understanding of natural language • Approach is to get the user to rephrase the query into terms that we can understand • The linguistic capability of the system is constrained by the expressiveness of XQuery • Semantics extracted by NaLIX • Tokens that can be directly mapped in XQuery • Semantic relationships between tokens • Error message are dynamically generated and provide suggestion on how to fix the error • No additional rules are needed to deal with incorrect English. If the parse tree of a grammatically incorrect English sentence is valid, the sentence can still be translated into XQuery

  11. Iterative Search Theory • Users often iteratively modify their queries based on the results obtained • Forcing the user to fully specify each follow-up query gets in the way of normal iterative search • In NaLIX, the basic unit of iterative search is a query tree • The user can easily return to any point of recent search history to take a different search direction

  12. Iterative Search Issues • Identification of equivalent objects between a follow-up query and its prior queries • Reference resolution to determine the semantic meaning of references to prior queries in the follow-up query • Limited linguistic capability remains an issue when handling follow-up queries; we thus need to design interactive facilities to guide users to formulate follow-up queries

  13. Query Context • Tokens in the query and the patterns of tokens that correspond to XQuery fragments, and the query context of its parent query • Context center - is the lowest name token among those whose corresponding basic variables are not included in a WHERE clause. If no such name token exists, then a context center is a name token whose corresponding basic variable is included in a RETURN clause • Limit a query tree to have only one context center at any time

  14. Reference Resolution • Reference resolution in NaLIX is therefore equivalent to the task of finding the corresponding name tokens in the parent query context for a reference token. • The concept of query context inheritance allows our system to be relatively robust against errors in reference resolution • Reference resolution errors have no negative impact on query translation results, unless the reference token is involved in the situations that cause changes to query context

  15. Follow-up Query Formulation • A follow-up query is often an incomplete sentence • Since only a valid query is allowed to have follow-up queries, the parent query of any follow-up query is valid. • Therefore, a valid follow-up query is still confined by XQuery syntax but only needs to provide valid XQuery fragments, instead of a valid full XQuery expression

  16. Query Tree Detection • Users often simply type in a query as a follow-up query even though the query is self-contained and essentially starts a new query tree • If a given query entered as a follow-up query does not need any information provided by its prior queries, then it can be regarded as a new root query

  17. Experiment • NaLIX vs. keyword search interface • Measurements • Ease of use • Search Quality • Search time was considered, but results were not of sufficient interest to be included in the paper

  18. Results - Time

  19. Results - Iterations

  20. Results - Precision

  21. Results - Recall

  22. Related Works • Schema-Free XQuery • Natural Language Interface to Databases • Dependency Parsers • Support for Iterative Database Search • Communication Models • Automatic Topic Discovery • Reference Resolution

  23. Conclusion • Working implementation – not just a theory • Supports comparison predicates, conjunctions, simple negation, quantification, nesting, aggregation, value joins, and sorting • Future support disjunction, multi sentence queries, complex negation, and composite result construction • Have a request for production deployment by a group outside of computer science • Expect it to lead to a whole new generation of query interfaces for databases

  24. Screenshot

More Related