1 / 31

From sentence to sense level information retrieval

From sentence to sense level information retrieval. Bridging CONTENT and object with HOLTRAN Technology. Mission Statement .

ervin
Télécharger la présentation

From sentence to sense level information retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. From sentence to sense level information retrieval Bridging CONTENT and object with HOLTRAN Technology

  2. Mission Statement • HOLTRAN (Higher Order Logic Translation) Technology fills the gap between two fundamental methods of representation of information: unstructured texts and structured data. • HOLTRAN Technology is aimed to serve as a universal international standard of representation, storing and exchange of information • Our mission is to become an industry-leading provider of Natural Language Processing (NLP) solutions for consumers and companies HOLTRAN Technology Ltd.

  3. What do we do? • Next generation knowledge base engine • Extracts information from multi-lingual natural language texts • Stores the information in a structured form • Keeps both semantic and textual information on equal rights • Enables knowledge base queries in natural languages HOLTRAN Technology Ltd.

  4. Why? “Poor classification costs a 10,000 user organization $10M annually.” Usability expert, Jacob Nielsen “Unstructured information doubles in quantity every three months” “In modern enterprise, 85% of data is unstructured” Gartner Group ButlerGroup “The challenge … is to find effective solutions to unlock the value from unstructured information sources, and to leverage it in Business Intelligence deployments” Butler Group

  5. NEED • common interface framework for diverse multilingual and multiform sources of information. • OBSTACLE • unstructured information • human-friendly • BUT • meaningless to computers. • and heavily dependant on manual processes HOLTRAN Technology Ltd.

  6. The general need for… • common interface being able to bind and put in order diverse multilingual and multiform sources of information …is also lately sharply realized by European leading companies and by ITEA board • which has formulated in its “Technology Roadmap on Software Intensive Systems ” the main challenge for Semantic data as “the possibility for applications to understand the ‘meaning’ of each others data” and has invited HOLTRAN Technology Ltd, to the annual ITEA meeting in Amsterdam (January, 2003). • As result, the company was invited to join the 6-th call EUREKA ITEA consortium: • DigiNews (News and Information for mobile e-paper terminals. Leader: Philips Technology) • HOLTRAN Technology addresses this need providing a framework for building semantic knowledge based information systems of a new generation being capable to extract semantic (structured) information from multilingual textual (unstructured) form and vice versa – to express the stored structured information in textual form.

  7. Novelty • Employment of a knowledge model based on our own improved • Multiple sorted type theory. • HOLTRAN based system is able to • store arbitrary order relationships between semantic entities (also incomplete and contradictive information) • store extendable language definitions of any type and complexity • effectively evaluate full responses to any queries related to the stored information HOLTRAN Technology Ltd.

  8. Novelty in Question Answering • Unlike search engines, HOLTRAN Question Answering system aims to supply users with the essence of "just the right information," instead of merely providing a list of hits. • Current question answering systems are based on sentence level information retrieval. • The rate of correct answers in such systems is about 70%. • HOLTRAN Question Answering system provides a revolutionary leap from sentence level retrieval to sense level information retrieval and answer formation

  9. Why we are better than other QA systems? • State-of-the-Art question answering systems use a variety of linguistic resources to understand users’ queries and match documents’ sections. • Most common linguistic resources include: • part-of-speech tagging • parsing • named entity extraction • semantic relations • dictionaries, WordNet, etc. • HOLTRAN Technology provides a unique framework embracing these resources inside the system and thus covering complex lexical, syntactic and semantic relationships between question and answer strings.

  10. The social impact of HOLTRAN in 2020 will cover the following situations: Two or more people talking over the "phone" each in his native language and receiving the answer in it. Traveling in a car one can ask "it" any relevant question and receive the system response in his native language. One can receive ANY "newspaper" in his native language (the first step to be done through our participation in DigiNews ITEA project). Fully automated call centers: each user can be receive the answer in his native language. Translation form one artificial language to another providing full database compatibility without additional software development (solution of PDM-ERP compatibility problem). No manual transaction treatment: all e-mails read and treated automatically (Spam, as well).

  11. Typical modern information system architecture User • The system repository contains • a relational database of some related meta information • a hybrid of a vault of textual documents stored as unstructured "black boxes“ Meta data + Documents (GUI) Relational DBMS Meta data (SQL) Meta data (XML) System Server System Client Unstructured documents Documents (text stream) Documents (text stream) • The System Server • provides execution of queries to the relational DB and access to the textual documents. • communicates to the system clients usually via GUI for the end users. Local documents HOLTRAN Technology Ltd.

  12. The basic problems of classical architecture: • Redundancy and inconsistency between contents of primary documents and meta information on these documents • Limitations on inter-version and inter-application compatibility • HOLTRAN technology solves these fundamental problems due to its ability to • extract the semantic content from unstructured documents • extract more semantic information from the same documents upon extending language definitions without a new software development • express the content in a textual form. HOLTRAN Technology Ltd.

  13. HOLTRAN based information system architecture User HOLTRAN interpreter translates information coming from users and documents in external languages to the internal KBMS language and vice versa. It allows users to communicate with the system in their native languages. Meta data + Documents (user native) HOLTRAN KBMS Content + languages definitions (HOLTRAN native) HOLTRAN Interpreter Browser Meta data + Documents (text stream) • HOLTRAN KB stores on equal rights both • application information • definitions of various external languages, i.e. any artificial or natural languages used to exchange information with applications and users. Local documents HOLTRAN Technology Ltd.

  14. How it works? • Objects are assigned entity type “e” • John plays table tennis well • Interior axiom representation “t” stands for truth type HOLTRAN Technology Ltd.

  15. How it works? (2) • Querying in HOLTRAN: • Who plays table tennis well ? • Interior axiom representation Inference procedure consists in finding of all consistent substitutions of free variables (x’s) in the tested formula with which it is provable. HOLTRAN Technology Ltd.

  16. What is higher order logic? First Order Logic: Mary and Cathy play tennis. First order part Mary or Cathy Not Mary and Cathy First order logic expressions – the ones being written in SQL and containing constants and variables only of simple types e, t,…

  17. What is higher order logic? (2) First Order Query - queries a variable of the first order: Who plays table tennis well ? Second Order Query - queries a variable of the 2-d order: What does John do?How does Mary play tennis ? 4-th order query Higher order logic expressions – the ones containing constants and variables of variable order types ee, et, eet …

  18. How do we do it in HOLTRAN Native programming language This is how looks the piece of code in HOLTRAN Native to program the questions of the sort: How does Mary play tennis ? (("How does"=) ##& NounPhrase ##> VerbPhrase ##& ("?"=) =>> \np:e\vp:et((x:(et)et vp:et) np:e)) This is the way Interpreter translates from English to HOLTRAN Native and back to English: < How does Mary play tennis ? > Test (x:(et)et (_3:eet (COM _5:et)) (ID _2:e)); > Assert (_1:(et)et (_3:eet (COM _5:et)) (ID _2:e)); > Mary plays tennis well. HOLTRAN Native programming language is especially designed to express any human-percept notions and ideas, including definitions of natural languages

  19. Product Overview • Our core product – • HOLTRAN Semantic Platform • is a suit of software components and tools serving as a middleware to build customizable and extensible applications for semantic processing of textual information in multiple artificial and natural languages. HOLTRAN Technology Ltd.

  20. HOLTRAN Semantic Platform comprises: • HOLTRAN Engine - the heart of the product which reads input information, translates it into internal instructions and executes them by directly accessing HOLTRAN Knowledge Bases. HOLTRAN Engine includes built in • Interpreter of HOLTRAN Native Language • A set of HOLTRAN Knowledge Bases • HOLTRAN Knowledge Connector  -  a package of  software components forming a C++ and Java API (SDK) to provide local or network access to HOLTRAN Knowledge Bases. HOLTRAN Technology Ltd.

  21. HOLTRAN Semantic Platform also comprises: • A configurable set of HOLTRAN Standard Applications including • HOLTRAN Desktop Assistant - a standard extension to MS Windows Explorer • HOLTRAN Network Assistant providing instant Information Search, categorization and summarization across an enterprise network • HOLTRAN Web Server providing access to HOLTRAN Knowledge Bases via http protocol and publishing documents to the linked HOLTRAN Knowledge Bases HOLTRAN Technology Ltd.

  22. HOLTRAN Question & Answering: Current Status • Consider the following short input: • Mark is a doctor. He has a clinic. Mark works well. Bill and Cathy love Mark. • John is a programmer. He is a good programmer. John builds a new program. He works well. • Alex is a driver. Alex builds a new house. He works hard. Olga helps him, she works hard too. HOLTRAN Technology Ltd.

  23. HOLTRAN Question & Answering: Current StatusNow the following dialogue takes place between the user (<) and the program (>). Note the dialogue management capabilities: HOLTRAN Technology Ltd.

  24. HOLTRAN Question & Answering: Current StatusSelf – learning capabilities: Note that “carnivore” , “lion” and “lab” are absent in the system dictionary and we introduce them in the dialogue for the first time: • < Alex is a carnivore. • < A lion is a carnivore. • < Who is Alex ? • > He is a driver and he is a carnivore. • < Who is a lion ? • > It is a carnivore. • < Mark builds a new lab. • < What does Mark do ? • > He works, builds a new lab and has a clinic. HOLTRAN Technology Ltd.

  25. HOLTRAN Question & Answering Test Cases - 1 • No doubts that none of existing products could pass even a half of these tests in viewable future. We expect to pass at least the first 7 cases in a year (the last one might require some additional efforts). • 1. Negation Accounting • I: The wolf huffed and puffed but he could not blow down that brick house. • Q: Could the wolf blow down the brick house ? • 2. Syllogisms • I: Every human is mortal. Socrates is a human. • Q: Is Socrates mortal ? HOLTRAN Technology Ltd.

  26. HOLTRAN Question & Answering Test Cases -2 • 3. “Wh" questions • I: After its final passage by both houses, the bill is sent to the president. • Q: Whom is a bill sent after its final passage by both houses ? • Q: When is a bill sent to the president ? • 4. References resolution • I: When a senator or a representative introduces a bill, he or she sends it to the clerk of his house, who gives it a number and title. • Q: Who sends a bill to the clerk of his house ? • Q: Who gives a bill a number and title ? • 5. Synonyms/antonyms accounting • I: Diesel engines are heavier than gasoline engines. • Q: Which type of internal-combustion engine is lighter ?

  27. HOLTRAN Question & Answering Test Cases -3 • 6. Semantic categories accounting • I: The heart employs a separate vascular system to obtain blood for its own nourishment. Two major coronary arteries regulate this blood supply. • Q: What is the function of coronary arteries ? • 7. Ontology accounting • I: Joseph Kennedy devoted the rest of his life to advancing the political careers of his sons, John, Robert and Edward. • Q: Is Robert Kennedy a brother of John Kennedy ? HOLTRAN Technology Ltd.

  28. HOLTRAN Question & Answering Test Cases -4 • 8. Merging distributed info • I: IBM and Philips announced a joint initiative to collaborate on radio frequency identification (RFID) technology for companies using supply-chain software. • I: Eastman Kodak Co. and IBM on Tuesday announced a joint effort to offer healthcare facilities products that combine Kodak's medical imaging technology and services with IBM's storage devices. • I: GiveMePower Corporation today announced it has partnered with IBM Corporation as one of four business solutions to be showcased in Intel Corporation’s "Inside Your Digital Life: Intel" exhibit at CeBIT 2004. • Q: Which companies does IBM Corporation have joint projects or partnership with ? HOLTRAN Technology Ltd.

  29. Context dependency Contradiction resolution • Absolute truth • < John works hard. > Yes, John works hard. < John does not work hard. • > No, he does work hard. • < How does John work ? • > He works hard. • (Currently implemented dialogue) • Relative truth • < Bryan says that John works hard. < Bill says that John does not work hard. < How does John work ? • > According to Bryan he works hard and according to Bill – not. • (Future dialogue) HOLTRAN Technology Ltd.

  30. Key Persons: • Dr. Alexander Brenner, President and CEO. • Ph.D. in Mathematics from the Technion - Israel Institute of Technology and M.Sci. from Moscow State University. • Previously: Image Processing and algorithms department leader at Imaginarix Ltd. and lecturer, at the Technion - Israel Institute of Technology. • Professional experience: • Pure and applied mathematics, Image and Signal Processing. Software engineering (object oriented design, testing and maintenance). Management of R&D teams. • Applications: • Image processing, Call Centres, Artificial intelligence (pattern recognition, natural language processing), scientific programming, industrial applications, mathematical and statistical modelling.

  31. Key Persons: • Dr. Victor Gluzberg, VP R&D. • Ph.D. and M.Sci. in physics and applied mathematics from Novosibirsk State University. Previously: Software Manager at Parametric Technology, Israel • Professional experience: • Applied mathematics and computer sciences, Physics, Software engineering ( requirements analysis, program specification and design, testing and maintenance). Management of R&D teams. • Applications: • Data processing, System programming, CAD/CAM, Artificial intelligence( pattern recognition, inference, natural language processing) scientific programming, industrial applications, mathematical and statistical modelling.

More Related