1 / 23

NLP Interchange Format

NLP Interchange Format. José M. García. Outline. What is NIF? Design requirements URI schemes NIF ontologies Use cases Relationship with ELRA Roadmap for NIF 2.0 Conclusions . What is NIF?. N atural Language Processing I nterchange F ormat

tamber
Télécharger la présentation

NLP Interchange Format

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLP Interchange Format José M. García

  2. Outline • What is NIF? • Design requirements • URI schemes • NIF ontologies • Use cases • Relationship with ELRA • Roadmap for NIF 2.0 • Conclusions

  3. What is NIF? • Natural Language Processing Interchange Format • NIF is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. • Building blocks • URI scheme for identifying elements in texts • Ontology for describing common NLP terms • Created and maintained by AKSW group of University of Leipzig, during the LOD2 EU project. • Community project: http://persistence.uni-leipzig.org/nlp2rdf/

  4. NIF design requirements

  5. URI schemes • Text needs to be referenceable by URIs • With URI references text can be used as resources in RDF statements • NIF distinguishes: • Documents • Text of the document • Substrings of the text. • URI scheme is an algorithm to create IDs for text and substrings • URI elements • Document URI • Separator • Character indices

  6. RFC 5147 • Canonical URI scheme for NIF is based on RFC 5147 • It standardizes fragment identifiers for text/plain media type http://www.w3.org/DesignIssues/LinkedData.html

  7. RFC 5147 • Canonical URI scheme for NIF is based on RFC 5147 • It standardizes fragment identifiers for text/plain media type http://www.w3.org/DesignIssues/LinkedData.html http://www.w3.org/DesignIssues/LinkedData.html#char=0,26610

  8. RFC 5147 • Canonical URI scheme for NIF is based on RFC 5147 • It standardizes fragment identifiers for text/plain media type http://www.w3.org/DesignIssues/LinkedData.html http://www.w3.org/DesignIssues/LinkedData.html#char=0,26610 http://www.w3.org/DesignIssues/LinkedData.html#char=1206,1218

  9. NIF Core Ontology • Classes and properties to describe relation between • Documents • Text • Substrings • Corresponding URI schemes

  10. NIF Core Ontology • Additional classes and properties (unstable/testing) • More URI schemes • Text structure (words, sentences, paragraphs…) • Part of Speech (POS) • Annotations with Stanbol • Confidence

  11. Workflows, Modularity and Extensibility of NIF • Workflows for NLP integration • Normalization • Tokenization • Merge RDF annotations

  12. Workflows, Modularity and Extensibility of NIF • NIF ontology logical modules • Terminological model • Inference model • Validation model • Vocabulary modules • FISE • ITS • OLiA • NERD • …

  13. Workflows, Modularity and Extensibility of NIF • Granularity profiles

  14. ITS Use Case • The Internationalization Tag Set 2.0 is a W3C working draft that is becoming a Recommendation. • ITS standardizes HTML and XML attributes which can be used to annotate nodes with processing information for language service providers (i18n, l10n) • ITS 2.0 RDF ontology was developed using NIF, including a round-trip conversion algorithm from ITS to NIF. • NIF is expected to receive wide adoption by translation & language service providers • ITS 2.0 RDF ontology provides properties which can be used to provide best practices for NLP annotations.

  15. OLiA Use Case • The Ontologies of Linguistic Annotation provide stable identifiers for morpho-syntactical annotation tag sets, so that NLP tools can use these ids for better interoperability. • OLiA provides Annotation Models and a Reference Model, comprising more than 110 OWL ontologies for over 34 tag sets in 69 languages • Features • Documentation • Flexible Granularity • Language Independence • NIF provides two properties • nif:oliaIndividual (links a nif:String to an OLiA Annotation Model) • nif:oliaCategory (links to the Reference Model)

  16. RDFaCE Use Case • RDFaContent Editor is a rich text editor that supports WYSIWYM authoring including various views of the semantically enriched textual content. • It combines results of different NLP APIs for automatic content annotation • Heterogeneous APIs access, URI generation and output data structure • Solution: server-side proxy, hard-coded input and connection of each API. • NIF simplified the integration, adding an interoperability layer

  17. What is ELRA? • European Language Resources Association • http://www.elra.info • Effort to make available Language Resources (LR) for language engineering and to evaluate language engineering technologies. • LR marketplace • Related organizations • ELDA (ELRA’s operational body) • LREC conferences

  18. What is ELRA?

  19. Relationship with NIF • Different objectives • LR written resources (esp. Corpora) can be annotated with NIF for further interoperability and integration with NLP tools • ADVANTAGE: Large test data collection to evaluate NLP tools • DISADVANTAGE: Cost of LR (though there are free ones)

  20. Roadmap for NIF 2.0 • Release of NIF 1.0 • DONE (Nov 2009) • Release of NIF 2.0 Draft • CURRENT effort on solving pending issues • Adoption in ITS 2.0 W3C (soon-to-be) Recommendation • NIF-Core ontology is becoming stable • RLOG - an RDF Logging Ontology • NIF Validator software available • Release of NIF 2.0 Core • Release of NIF 2.0 Extensions • ITS ontology, PROV ontology, Lemon Ontology, NERD, UIMA, MARL opinion ontology…

  21. Conclusions • NIF allows to integrate NLP tools using Linked Data • Ongoing effort • Many adopters and supporters • LOD2 EU project • Several W3C working groups • Named Entity Recognition and Disambiguation (NERD) • Ontologies of Linguistic Annotation (OLiA) • … • 27 different implementations and use cases • Some available at http://persistence.uni-leipzig.org/nlp2rdf/

  22. Thanks for your attention Questions?

  23. References • http://persistence.uni-leipzig.org/nlp2rdf/ • Integrating NLP using Linked Databy Sebastian Hellmann, Jens Lehmann, Sören Auer, and Martin Brümmerin 12th International Semantic Web Conference, 21-25 October 2013, Sydney, Australia

More Related