1 / 15

TQL (text query language)

TQL (text query language). Alexander Kotov Sungeun Kim Yeonjung Chung. Problem definition. Text data is the most commonly used way of information storage and transfer; Can we automatically extract the knowledge from such data. How? How to efficiently store and access such information;

Télécharger la présentation

TQL (text query language)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TQL (text query language) Alexander Kotov Sungeun Kim Yeonjung Chung

  2. Problem definition • Text data is the most commonly used way of information storage and transfer; • Can we automatically extract the knowledge from such data. How? • How to efficiently store and access such information; • Information extracted from natural language text is not completely reliable (many contradictory sources); • Textual information has highly unstructured nature (relational model does not work);

  3. Problem definition • Named Entity Taggers can extract entities and dependency parsers can extract relations with reasonably high accuracy (extraction); • Entity-Relation Graph is a simple and powerful way of representing textual information (storage); • We can design a probabilistic measure of trustworthy of information (quantify the probability of information being correct) => “Fuzzy” Entity-Relation graph. (unreliability) Access ?!

  4. Problem definition Washington, D.C. R-CAPITAL-PLACE United States A-WHY-FAMOUS-PERSON Massachusetts S-BIRTHPLACE John Fitzgerald Kennedy A-DEFINITION-PERSON S-BIRTHDATE Boston Jacqueline Kennedy May 29th, 1917

  5. Text Query Language • Generalization is achieved by defining a minimal, yet powerful set of operators; • Should support multiple application scenarios; • Generalized and flexible structure, like that of SQL is not possible; • Declarative vs. Functional

  6. Application scenarios • Inference (infer relations or linkages between entities or find an entity that is "remotely connected" with some known entities); • Navigation (the goal is to navigate from some known entities to other interesting (unknown) entities); • Comparison (the goal is to compare two groups of entities to figure out differences and similarities).

  7. Possible queries • query about the connection between two entities. The result will allow us to identify a path or a set of paths in the entity-relation graph, consisting of entities and relations between them; • query about the entities that are related to a particular entity (neighbors finding); • query about similar entities (entities related to the same entities); • query about entities satisfying certain conditions of arbitrary complexity (e.g.,  find entities that are in particular relations with some other entities);

  8. Data Definition Language TYPE (TypeName) REL(Entity(Type),Entity(Type)) INSTANCE ( REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), )

  9. “John goes to school in Urbana.” 0 verb 1 2 subj:person mod 3 pcomp-n 4 mod 5 pcomp-n:loc Data Definition Language - Instance = Sentence . John : subject ( type subj:person ) . goes : verb ( type verb ) . to, in : modifier ( type mod ) . school : complement ( type pcomp-n ) . Urbana : complement ( type pcomp-n:loc ) - Instance a set of relation : exist 5 relations INSTANCE( REL(John(subj:person), goes(verb)), REL(goes, to(mod)), REL(to, school(pcomp-n)), REL(school, in(mod)), REL(in, Urbana(pcomp-n:loc)) )

  10. Mining Language FIND CONNECTION (Entity1, Entity2) FIND RELATED(Entity) FIND SIMILAR (Entity) FIND ENTITY (Entity(Type), CONSTRAINTS ( REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)), REL(Entity(Type), Entity(Type)) ) )

  11. Query execution School to Query: FIND RELATED (John) 1 1 1 go in 1 1 Urbana John live in 1 1 1 1 1 Be at 1 1 Brother University of Illinois • John goes to school in Urbana • John is a brother of Mary • John lives in Urbana • Mary graduated from University of Illinois at Urbana • Mary lives in New York 1 1 1 1 1 Mary graduate of from 1 1 1 in live New York

  12. Query execution Query: FIND CONNECTION (John, New York) School to 1 1 1 go in 1 1 John live in Urbana 1 1 1 1 1 Be at 1 1 Brother University of Illinois 1 1 1 1 1 Mary of graduate from 1 1 1 New York in live

  13. Query execution Query: FIND SIMILAR (John) School to 1 1 1 go in 1 1 John live in Urbana 1 1 1 1 1 Be at 1 1 Brother University of Illinois 1 1 1 1 1 Mary of graduate from 1 1 1 in live New York

  14. Technical challenges • Flexible parser; • Entity-Relation graph is a complex data structure with significant level of redundancy (hashing and complex indexing to reduce space); • Maintaining type information and consistency between typed entities; • Implementation of efficient query execution strategies.

  15. Future work • Extension of the language by adding new operators (coverage); • Optimization of query execution performance (efficiency); • Automated generation of instances from natural language sentences (usability).

More Related