1 / 18

Automatic Report Generation from Ontologies: the MIAKT Approach

Automatic Report Generation from Ontologies: the MIAKT Approach. Kalina Bontcheva, Yorick Wilks Department of Computer Science University of Sheffield. Rationale. NLG takes as input structured data in a knowledge base or ontology and produces natural language text

earl
Télécharger la présentation

Automatic Report Generation from Ontologies: the MIAKT Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Report Generation from Ontologies: the MIAKT Approach Kalina Bontcheva, Yorick Wilks Department of Computer Science University of Sheffield

  2. Rationale • NLG takes as input structured data in a knowledge base or ontology and produces natural language text • Applied to provide automatic documentation of ontologies or generate textual reports from formal knowledge • Keeps texts constantly up-to-date so they reflect changes in the ontology

  3. The MIAKT project • Medical Imaging and Advanced Knowledge Technogies • Breast cancer • Triple assessment process • Oncologist – clinical assessment • Hystopathologist – cytology • One or more radiologists – X-ray mammograms, MRI scans • Surgeon • Sometimes radiographer • Types of images • Mammograms, MRI scans, ultrasound…

  4. The MIAKT Demonstrator

  5. Semantic Image Annotation

  6. The Domain Ontology

  7. Generation Service Input

  8. Generation Service Output

  9. Generation Architecture

  10. Removing Repeating Triples • Based on the ontology – inverse properties • <daml:ObjectProperty rdf:about= "file:/...#involved_in_ta"> <daml:inverseOf rdf:resource= "file:/...#involve_patient"/> … • involved_in_ta(01401_patient, ta-soton-1069) involve_patient(ta-soton-1069, 01401_patient) • More complex reasoning will be required to detect facts entailed by already said facts

  11. Discourse Planning • Schemas – capture regular patterns in the domain; can be applied recursively • Describe-Patient -> Patient-Attributes, Describe-Procedures • Patient-Attributes -> [attribute(Patient, Attribute)], Patient-Attributes *

  12. The Property Hierarchy • Special linguistically-motivated properties were introduced to make the NLG modules more generic: • active-action (e.g. involve_patient) • passive-action (e.g., involved_in_ta) • Attribute (e.g. has-age, has-size) • part-whole (e.g., consists-of) • All properties from the ontology were made sub-properties of one of these 4 • More light-weight approach than having a complete linguistic ontology like GUM (Generalised Upper Model)

  13. Ontology-Based Aggregation • Joining attribute and part-whole properties with the same first argument to have more coherent sentences • ATTR(Abnormality: 01401, Mass: 01401_mass)ATTR(Abnormality: 01401, Margin: i_m_microlob)ATTR(Abnormality: 01401, Shape: i_shape_round)ATTR(Abnormality: 01401, Diagnose: i_pr_malig) • Without aggregation:The abnormality has a mass. The abnormality has a microlobulated margin. The abnormality has a round shape. The abnormality has a probably malignant assessment. • With aggregation:The abnormality has a mass, a microlobulated margin, a round shape, and a probably …

  14. Surface Realisation • The input is an RDF statement and the concept which is going to be the subject of the sentence: ATTR(Abnormality: 01401, Mass: 01401_mass) + Abnormality: 1401 • ATTR and PART_OF relations are handled already by an existing realiser (HYLITE) which treats the RDF as a graph and finds a path through it, starting from the focused concept • Active and passive action properties are mapped to semantic roles like OBJ, PTNT, AGNT • AGNT(Mammography: 01402, PRODUCE_RESULT)OBJ(PRODUCE_RESULT, Med_Image: 01402_left_cc)

  15. Domain Portability • Availability of lexical resources for the domain, e.g. UMLS and SPECIALIST or a lexicalised ontology • The classification of the properties into the 4 linguistic ones – possible to do semi-automatically if there are good naming conventions • The 4 linguistic properties may have to be extended to include others if the domain requires it • The main effort will be in the text structuring patterns, which require significant understanding of the system in order to modify them • Machine learning to induce text patterns from labelled examples

  16. Presented an approach for automatic generation of texts from ontologies MIAKT exploits information from the ontology in order to filter out repetitive information and group together similar facts Main contribution is in showing how NLG tools can be designed to be easily customisable by non-specialists (through GUI tools) New application: sekt.semanticweb.org Conclusion

  17. http://www.aktors.org/miakt/ http://www.dcs.shef.ac.uk/~kalina/papers.html http://sekt.semanticweb.org Further Info

  18. The MIAKT lexicon • Currently contains 320+ terms lexicalising: • 76 concepts • 153 instances in the MIAKT ontology • Created manually from: • BI-RADS and NHS documents • Online papers and Medline abstracts to verify and enrich the term entries with synonyms

More Related