1 / 33

A case study on interoperabilty for language resources and applications

A case study on interoperabilty for language resources and applications. Marta Villegas, Núria Bel, Santiago Bel, Víctor Rodríguez. Index. Use case Requirements & problems Corpus collection / distributed search Corpus integration Services interoperability Common interfaces

Télécharger la présentation

A case study on interoperabilty for language resources and applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A case study on interoperabilty for language resources and applications Marta Villegas, Núria Bel, Santiago Bel, Víctor Rodríguez

  2. Index • Use case • Requirements & problems • Corpus collection / distributed search • Corpus integration • Services interoperability • Common interfaces • Shared type system • Semantic description • Conclussions

  3. Use case Historical linguistics. Diachronic & comparative study involving different romance languages

  4. Use case: currentscenario Catalan refence corpus Fully annotated data since 1833 XML database Old Catalan (annotation in progess) MySQL database Old Catalan (fully annotated) MySQL database Hitorical digital library Pandora/PANDAS system Technical corpus (fully annotated) Corpus Work Bench (CWB)

  5. C L A R I N ISO METADATA REGISTRY Use case: desiredscenario (distributedsearch) request response

  6. Use case: Requirements • Data collection (distributed search) • Data integration • Interoperability of services

  7. Corpus collection / Distributed search • Metadata interoperability • Language (català, cat, catalan) ISO • Date (s XV, 1400-1499, 1400/)  ISO • Genre (!!) • Common search protocol

  8. request server client server server server response Corpus collection / Distributed search SRU: Web Service-basedprotocolforquerying Internet indexes ordatabases Syntax: CQL Semantics: Context Sets & Profiles

  9. Corpus integration MAF2CQP DATA DATA DATA DATA MAF Annotated Data CWB

  10. Corpus integration Wrappers MAF2CQP DATA DATA DATA DATA Format CWB Tags

  11. Corpus integration Annotated? DATA DATA DATA DATA yes no Format PoS tagger PoS tagger wrappers PoS tags Freeling Apertium N-grams CWB

  12. Services interoperability • Deployment of NLP tools as SOAP Web Services: • Definition of common interfaces • Definition of shared types to model standard request & response messages • Explore the semantic description of WS not only for discovering purposes but also for invoking them

  13. Services interoperability Command line WSDL message (SOAP WS) • $ TagText -text • -numlines • -tagonly • -prepronly • -tagblanks • -notagurl • -notagemail • -notagip • -notagdns • -encoding • -errors • name=“TagText” • part name=“numlines” • part name=“Tagonly” • part name=“Prepronly” • part name=“ Tagblanks” • part name=“notagurl“ • part name=“Notagemail” • part name=“Notagip” • part name=“Notagdns” • part name=“Encoding” • part name=“Errors”

  14. Services interoperability • <wsdl:message name=“CommandLineRequest"> • <wsdl: part name=“numlines” element=“numlines“></wsdl:part> • <wsdl: part name=“Tagonly” element=“Tagonly“></wsdl:part> • <wsdl: part name=“Prepronly” element=“Prepronly“></wsdl:part> • <wsdl: part name=“Tagblanks” element=“Tagblanks“></wsdl:part> • <wsdl: part name=“Notagurl“ element=“Notagurl“></wsdl:part> • <wsdl: part name=“Notagemail” element=“Notagemail</wsdl:part> • <wsdl: part name=“Notagip” element=“Notagip“></wsdl:part> • <wsdl: part name=“Notagdns” element=“Notagdns“></wsdl:part> • <wsdl: part name=“Encoding” element=“Encoding“></wsdl:part> • <wsdl: part name=“Errors” element=“Errors“></wsdl:part> • </wsdl:message>

  15. Services interoperability / common interfaces • Interperability is achieved by separating interfaces from implementations • Common interfaces need: • An agreed set of operations • Compatibility of elements in I/O messages and • Compatibility of schema structures in message elements.

  16. <wsdl:types> (Shared !!) type declaration </wsdl:types> Services interoperability (wrapped document style) <wsdl:message name=“CommandLineRequest"> <wsdl:part name=“parameters“ element=“parameters”> </wsdl:part> </wsdl:message> • Type sharing, • Type reusing • Type extension

  17. Services interoperability <wsdl:message name=“POSTaggerRequest"><wsdl:part name="POSTaggerParams"element="POSTaggerParams“</wsdl:part> </wsdl:message> VALID SOAP MESSAGE <POSTaggerParams > <MainParams> <language>spa</language> <text> <file>http://somewhere/somefile</file> </text> </MainParams> <optParams></optParams></POSTaggerParams>

  18. Language guesser IF POStagger POStagger POStagger XProcess Services interoperability

  19. Language guesser IF POStagger POStagger POStagger XProcess Services interoperability

  20. Language guesser IF POStagger POStagger POStagger XProcess Services interoperability

  21. Services interoperability Format of the SOAP message (message moving arround between services) NOT the structure of the message content. VALID SOAP MESSAGE $ TagText –lang –file <POSTaggerParams > <MainParams> <language>spa</language> <text> <file>http://somefile</file> </text> </MainParams> <optParams></optParams></POSTaggerParams> $ analyzer –f config/en.cfg

  22. Services interoperability ISO-639-3-code VALID SOAP MESSAGE <POSTaggerParams > <MainParams> <language>spa</language> <text> <file>http://somefile</file> </text> </MainParams> <optParams></optParams></POSTaggerParams> URI MIME types

  23. Services interoperability VALID SOAP MESSAGE Annotated Text ?? <POSTaggerResponse> <MainParams> <POSAnnotatedText> <file>http://somefile</file> </POSAnnotatedText> </MainParams></POSTaggerResponse> NOT everything is XML so NOT everything has a XSD type

  24. Services interoperability 1- Identification of basic operations & I/O

  25. Services interoperability 2- Taxonomy & Domain elements Taxonomy Domain elements

  26. Services interoperability 3- Web services descriptions (MyGRID) Service Ontology and Domain Ontology (no invocation details). Service Ontology acts as a service model and the Domain Ontology acts as controlled vocabulary for the model ‘External’ xml annotation of services compliant with the model

  27. Conclussions Metadata & Data interoperability: • Standards are a must but sometimes: • Not well documented • Lack of tools • too weak • Different approaches/methods (what is a token?) • Network effect will improve the situation

  28. Conclussions Services interoperability • Common interfaces and shared types • Type sharing, type reusing and type extension enable to model messages according to some common schema • This does not mean that all objects (I/O) moving around need to adhere to a schema: • Many I/O objects are not XML objects

  29. Conclussions • In the best case: I/O types come from a common type system (different type systems can coexist) • These types may simply identify the existence of a particular data type or may further describe the internal structure of the data type. • In the worst case: types are local and remain ‘underspecified’ as far as their content is concerned.

  30. Thank you

More Related