Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science

Artificial Intelligence Research CentreProgram Systems InstituteRussian Academy of Science 152020 Pereslavl-Zalessky Russia

INEX: Tools for Information Extraction Artificial Intelligence Research CentreProgram Systems InstituteRussian Academy of Science 152020 Pereslavl-Zalessky Russia +7 48535 98065 inex@epk.botik.ru

Information extraction Objective: • extract meaningful information of a pre-specified type from (typically large amounts of) texts for further analytical purposes Output: • data structures of a pre-specified format (filled scenario templates)

Examples • Sports report: <winner>, <loser>, <score>, <location>, <date>… • Database on rental accommodation opportunities: <location>,<renting price>, <bedrooms number>, <phone number>…

Possible IE application scenarios: • inference of new information (knowledge acquisition) • query formulation and answering in human-computer systems • automatic generation of abstracts and summaries • visualization of document content, etc.

The `Newsmaking’ task • <newsmaker> • <type of newsmaker> (person or organization) • <message> • <type of message> (original, cited, a reference to another newsmaker)

IE system architecture

Tokenisation & sentence segmentation • Tokenisation identification of words, punctuation marks, delimiters, special characters • Sentence segmentation recognizing sentence boundaries

Morphological analysis • maps every word-form of the input text to (a) canonical form(s) • recognizes the word's morphological properties Results are typically ambiguous.

Filtering • reduces the text to be subjected to further processing to potentially relevant portions

Disambiguation • a side effect of other processes (e.g., microsyntactic analysis) • a stand-alone stage

Microsyntactic analysis • identifies noun phrases (NP) • identifies some regularly formed constructions (numbers, dates, personal proper names)

Macrosyntactic analysis • identifies clause boundaries • constructs clause hierarchy within a sentence

Named entity recognizer • identifies proper names • assigns semantic features to certain items

Information extraction rules • a domain knowledge representation formalism (scenario templates) • a set of patterns to identify template elements in a text (covering the many possible ways to talk about the target event elements)

IE pattern includes: • a set of rules that define how to retrieve this pattern in a text • a set of constraints imposed on textual elements to fit into a particular slot of the target

Coreference Resolver • recognizes different occurrences of the same entity in a text

Merging partial results • merging partially filled templates to produce a final, maximally filled template

Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science

Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science

Presentation Transcript

INSTITUTE OF SPECTROSCOPY RUSSIAN ACADEMY OF SCIENCES

Program Systems Institute Russian Academy of Sciences

Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

P.P.Shirshov Institute of Oceanology Russian Academy of Sciences

Institute of Systems Science, AMSS, Chinese Academy of Science, P.R.China

Artificial Intelligence Research Laboratory Department of Computer Science

Artificial Intelligence Applications Institute

Artificial Intelligence Research Center

Nuclear Fusion Institute of Russian Research Centre “Kurchatov Institute”

K. Kolin Institute of Informatics Problems of Russian Academy of Science

Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology

First input from Russian Academy of Sciences’ Space Research Institute

V. Vershkov NUCLEAR FUSION INSTITUTE OF RUSSIAN RESEARCH CENTRE “KURCHATOV INSTITUTE

Michel Généreux - Austrian Research Institute for Artificial Intelligence

Artificial Intelligence Research Center

artificial intelligence program

Learn artificial intelligence enroll to artificial intelligence program

Water Problems Institute of Russian Academy of Science

RUSSIAN ACADEMY OF SCIENCES PROGRAM SYSTEMS INSTITUTE

artificial intelligence program

Artificial Intelligence Systems Market

Artificial Intelligence Institute in Delhi