1 / 14

Text Analytics on UIMA and UIMA Semantic Search Engine

Text Analytics on UIMA and UIMA Semantic Search Engine. ISM209 David Lewis Student Project Presentation 2006-12-05. What. Learn about UIMA UIMA Origins and Applications UIMA Architecture and Components Juru extended For XML Document Search Demonstration. UIMA Origins and Goals.

addison
Télécharger la présentation

Text Analytics on UIMA and UIMA Semantic Search Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Analytics on UIMA and UIMA Semantic Search Engine ISM209 David Lewis Student Project Presentation 2006-12-05

  2. What • Learn about UIMA • UIMA Origins and Applications • UIMA Architecture and Components • Juru extended For XML Document Search • Demonstration

  3. UIMA Origins and Goals • Developed by IBM Research over 4 years • Offered by IBM as open source EOY05 • DeveloperWorks –WebSphere production • AlphaWorks – Early adopters • Source Forge – Handoff In Process • “Bridge from the unstructured word to the structured world” • “UIMA SDK supports development, discovery, composition and deployment of multi-modal analytics for the analysis of unstructured information”

  4. UIMA Applications • WebSphere Information Integrator OmniFind Edition (search engine) • Lotus Notes search • DARPA UIMA Working Group (WWW mining) • Unstructured Information Management (UIM) Research and Instruction • CMU, Stanford, UMass Amherst • Others • SAIC, BBN, Mayo Clinic, MITRE Corp • “14 Software Vendors” (press in open source announcement

  5. Architecture and Components • UIMA Framework - run-time environment • UIMA SDK – all Java implementation of framework with Eclipse IDE integration

  6. Components • UIMA Framework Core • Externalized Framework Plug-ins • Common Annotation Structure (CAS) • Type System (Person, Organization, Bank, etc) • Document Annotator, Analysis Engines • Collection Processing Engine • CAS Sources and Sinks • Resource and Configuration Manager, Logger, etc

  7. Processing Engine Configurator

  8. Processing Engine Configurator

  9. Aggregate Analysis Engines • Analysis engines may be composed into aggregate engines • Analysis Engine Assembler • Distributed execution support

  10. UIMA Tools and Utilities • CAS Save/Restore • Configuration Editors • Annotation Viewer • CAS Visual Debugger • Document Analyzer • Graphical tool for applying analysis engines and viewing results • Juru-based Semantic Search Engine

  11. Exploiting Analysis Results • Semantic Search • Contribute analysis results (CASs) to “Juru” XML search engine indexer • Typed-entity recognizers (e.g., name-entity) • XML Fragments query language • Database Insert/Update Stream • Contribute analysis results to database

  12. Juru Search Engine Extensions for XML • Extended Vector Space Model • Compound index items: ( context, word ) • Cosine distance with context • Relaxed match on context (context resemblance measure)

  13. Demonstrations • Running an Analysis Engine • Building Collection Processing Engine • Running Semantic Search

  14. References • UIMA SDK Users Guide Reference • http://dl.alphaworks.ibm.com/technologies/uima/UIMA_SDK_Users_Guide_Reference.pdf • An Extension of the Vector Space Model for Querying XML Documents via XML Fragment • http://xml.coverpages.org/CarmelFragments.pdf

More Related