70 likes | 207 Vues
Ontea: Pattern based Annotation Platform. Michal Laclavík. Ontea Method. Motivation To create semantic meta data from texts or documents Approach Even unstructured text contains patterns Patterns can be used to extract various objects from text Results are: key - value pairs
E N D
Ontea: Pattern based Annotation Platform Michal Laclavík
Ontea Method • Motivation • To create semantic meta data from texts or documents • Approach • Even unstructured text contains patterns • Patterns can be used to extract various objects from text • Results are: key - value pairs • Such pairs can be transformed to ontology individuals • Class – individual • Individual – property http://ontea.sourceforge.net
Text Bratislava is the capital of Slovakia. Slovakia is in Europe. Pattern: “(in|by) + (the)? *([A-Z][a-z]+)” for Location Ontea discovers key – value pair: Location – Europe By transformation to ontology knowledge base - it finds Europe as continent using inference (sub-class of Location) Continent – Europe More Examples are in the table: Result Examples http://ontea.sourceforge.net
Features • Identification of concept instances from the ontology • Automatic population of ontologies with instances • Identifying relevance, when creating instances using information retrieval techniques • Large scale semantic annotation of documents or texts using Google’s MapReduce architecture. http://ontea.sourceforge.net
Advantages • Simple, customizable method • Not tied to document structure • Architecture build on detection of key-value pairs and its various transformation. For example: • Text: “Slovensko je v Európe“=> • Extraction: Location – Európe => • Transformation, Lemmatization: Location – Európa => • Transformation, Ontology: Continent – Europe • Scalable method. Ported to Grid and Hadoop. • Applicable on texts in any language • Success rate 60%-90% depending on used patterns, transformers and application http://ontea.sourceforge.net
Integration with other tools URL DocConverter Plain Text Nalit Language Identification Ontea Pattern Matching Morphonary Transformation: Lemmatization Transformation: Individual Search and Creation Transformation: Relevance Identification Lucene Ontology Repository http://ontea.sourceforge.net
Future research & development http://ontea.sourceforge.net/