Enhance Semantic Metadata Creation with Ontea: A Pattern-Based Annotation Platform
Ontea is a powerful pattern-based annotation platform designed to create semantic metadata from unstructured texts or documents. By leveraging patterns within text, Ontea extracts key-value pairs that can be transformed into ontology individuals. With its customizable and scalable architecture based on key-value pair detection, Ontea operates independently of document structure, allowing for large-scale semantic annotation. This platform features compatibility with multiple languages and integrations with various tools, making it a versatile solution for information retrieval and ontology population.
Enhance Semantic Metadata Creation with Ontea: A Pattern-Based Annotation Platform
E N D
Presentation Transcript
Ontea: Pattern based Annotation Platform Michal Laclavík
Ontea Method • Motivation • To create semantic meta data from texts or documents • Approach • Even unstructured text contains patterns • Patterns can be used to extract various objects from text • Results are: key - value pairs • Such pairs can be transformed to ontology individuals • Class – individual • Individual – property http://ontea.sourceforge.net
Text Bratislava is the capital of Slovakia. Slovakia is in Europe. Pattern: “(in|by) + (the)? *([A-Z][a-z]+)” for Location Ontea discovers key – value pair: Location – Europe By transformation to ontology knowledge base - it finds Europe as continent using inference (sub-class of Location) Continent – Europe More Examples are in the table: Result Examples http://ontea.sourceforge.net
Features • Identification of concept instances from the ontology • Automatic population of ontologies with instances • Identifying relevance, when creating instances using information retrieval techniques • Large scale semantic annotation of documents or texts using Google’s MapReduce architecture. http://ontea.sourceforge.net
Advantages • Simple, customizable method • Not tied to document structure • Architecture build on detection of key-value pairs and its various transformation. For example: • Text: “Slovensko je v Európe“=> • Extraction: Location – Európe => • Transformation, Lemmatization: Location – Európa => • Transformation, Ontology: Continent – Europe • Scalable method. Ported to Grid and Hadoop. • Applicable on texts in any language • Success rate 60%-90% depending on used patterns, transformers and application http://ontea.sourceforge.net
Integration with other tools URL DocConverter Plain Text Nalit Language Identification Ontea Pattern Matching Morphonary Transformation: Lemmatization Transformation: Individual Search and Creation Transformation: Relevance Identification Lucene Ontology Repository http://ontea.sourceforge.net
Future research & development http://ontea.sourceforge.net/