1 / 32

PANACEA WP3 The Platform

PANACEA WP3 The Platform. WP participants: UPF, ILC, ILSP, LG, DCU, ELDA Final Annual Review 19 th February 2013 Marc Poch, UPF (marc.pochriera@upf.edu). Summary. Objectives Platform components / Demo Achievements Functional platform

pello
Télécharger la présentation

PANACEA WP3 The Platform

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PANACEA WP3 The Platform WP participants: UPF, ILC, ILSP, LG, DCU, ELDA Final Annual Review 19th February 2013 Marc Poch, UPF (marc.pochriera@upf.edu)

  2. Summary • Objectives • Platform components / Demo • Achievements • Functional platform • Interoperability: Travelling Object, Common Interfaces, format converters, etc. • Scalability • WP7 Evaluation • Conclusions and future work

  3. Objectives • Development of a platform (a space of interoperability defined by standardized protocols and common interfaces) for the easy integration of a variety of software components, tools and methodologies deployed as web services to configure a factory for the automation of acquisition, processing and annotation of language resources. • WP3.1. (T1-T6) Architecture and design of the platform  • WP3.2 (T15-T30) Work Flow editor and engine  • WP3.3. (T7-T30) Common interfaces, middleware and temporal files, journaling, etc.  • WP3.4 (T15-T30) The Registry  • WP3.5 (T7-T30) Deployment of web services of the components supplied by WP4 to WP6 

  4. From local tools to sharing workflows

  5. Platform tools and portals PANACEA Platform: uses, adapts and improves myGrid tools for eScience(used in biology, social science, music, astronomy, multimedia and chemistry). Share tools (remotely run distributed tools) Share and find Web Services Call / chain Web Services Share and find workflows Registry Workflows Social Network Web Services Biocatalogue myExperiment SOAP or REST Taverna www.taverna.org.uk Clients: Java, Python, Perl, etc. PANACEA Registry: registry.elda.org Soaplab PANACEA myExperiment: myexperiment.elda.org JAX-WS, Axis, CXF, etc.

  6. Technological option:Web Services • Easy deployment of command line tools as WS. (Java, Python, C++, UIMA, etc. ) • Clients: Java, Python, Perl, Taverna, etc. • No coding needed! Only metadata • “Polling” techniques for long lasting tasks • Web form to run the web services • URL input / output ready • PANACEA improvement for SOAP messaging (network usage and memory) • PANACEA limit multiple users SOAPLAB 2 (SOAP) Web Services TAVERNA Workflow editor BioCatalogue Registry myExperiment Social network

  7. Technological option:Registry • User friendly GUI • Free, open source, Continuously maintained • Search function • Users rating (users feedback) • Service annotations and Language Categorization (PANACEA) • Monitoring system (web service status and data results) SOAPLAB 2 (SOAP) Web Services TAVERNA Workflow editor BioCatalogue Registry Passed Warning Failed Unchecked myExperiment Social network

  8. Technological option:Taverna • User friendly GUI • Free and open source • Continuously maintained (v. 2.4) • SOAP and REST web services • Credentials manger (passwords, certificates, etc.) • Multiple files processing (“lists”) • PANACEA Workflows, best practises, videos, etc. : • Parallelization, Error recovery: “retries”, Polling • PANACEA collaboration: bug fixing and pre-release tests SOAPLAB 2 (SOAP) Web Services TAVERNA Workflow editor BioCatalogue Registry myExperiment Social network

  9. PANACEA

  10. Demos • Previous Review: • PANACEA Registry / PANACEA myExperiment • Run Web Services and Workflows • Design and merging of workflows in Taverna • Final Review: Specific examples • Creation of a bilingual dictionary • Twitter NLP • Web cleaner and anonymizer • PANACEA Registry / PANACEA myExperiment

  11. Demos I Creation of a bilingual dictionary • http://myexperiment.elda.org/workflows/93 • Input: Pairs of Basic Xces Documents • English: http://nlp.ilsp.gr/panacea/Bilingual/data/20101222/LAB_EN_FR/www.ilo.org/1.xml • French: http://nlp.ilsp.gr/panacea/Bilingual/data/20101222/LAB_EN_FR/www.ilo.org/191.xml • Sentence alignment: Hunalign(3rd party tool) Interoperability  • PoS tagging: Treetagger(3rd party tool) Interoperability  • Build phrase tables: Moses (3rd party tool) Interoperability  • Bilingual dictionary extractor Video: http://ws02.iula.upf.edu/panacea/examples/videos/Panacea_bilingual_dictionary_extraction_v01.mp4

  12. Demos II Twitter NLP + Registry (3rd party tool)  • This web service is based on the Twitter NLP tool developed by Noah's ARK group. • Noah's ARK group is Noah Smith's research group at the Language Technologies Institute, School of Computer Science, Carnegie Mellon University. • Search the WS in the Registry • Check monitoring system • Use web client with example data

  13. Demos III Web cleaner and anonymizer http://myexperiment.elda.org/workflows/98 • Input: a list of URLs to process • Example: a web article from www.fifa.com • ILSP Web cleaner and text extractor WS • UPF Anonymizer WS • Internally calls Freeling NER WS (3rd party tool) Interoperability  • Video: http://ws02.iula.upf.edu/panacea/examples/videos/Panacea_web_cleaner_and_anonymization_v01.mp4

  14. WP3 Achievements • Functional and Operational Platform  • Multiple tools, webs and features  • Ready to use  • Usability  • Real Users  • Interoperability  • Common Interfaces  • Travelling Object  • 3rd party tools Integration  • Format converters  • Scalability  • Web service scalability: long lasting tasks  • Workflow design optimization: robustness  • Machine resources: handling parallel requests 

  15. Functional and Operational Platform • PANACEA Registry  • 157 web services  PANACEA WS benefits: WS are easy to deploy (low maintenance cost)  • More than 1300 annotations  Usability / Doc. • A cloud of 164 tags  • Monitoring system: WS up and running 94.82% since their deployment (97%)  Availability • PANACEA myExperiment • 74 shared workflows  • Storage System  Usability

  16. Functional and Operational Platform:Tutorials and Documentation • Tutorials  • Specific and General tutorials  • More than 12 videos  Usability • Frequently Asked Questions  • Documentation  • Registry annotations, tags and Categories  • Common Interfaces documentation: xml, web, etc.  • Travelling Objects documentation 

  17. Functional and Operational Platform:Users • WP7 Validators • Linguatech (WP8) • Qualia(Business intelligence) • CNGL (Centre for Next Generation Localisation) • INCYTA (Translation) • Master and Phd Students make use of the PANACEA platform • http://ws02.iula.upf.edu/panacea/statistics/upf-statistics.html

  18. Interoperability • Three levels of interoperability: • COMMUNICATION PROTOCOLS: Soap, Rest • DATA • PARAMETERS • Tool A • Tool A • Tool B • Tool B Tool B does not “understand” format N! All tools understand the previous format A B C D A B C D A B C D Y T Q Z

  19. Common Interface • A Common Interface (CI) defines the mandatory parameters for every functionality: http://panacea-lr.eu/en/info-for-professionals/documents/ http://registry.elda.org

  20. Travelling Object • The Travelling Object (TO) is the common data and metadata format used in PANACEA to make components understand each other. (Interoperability) • TO1 is the minimal common vertical in-line format used by the deployed tools since the first version of the platform using XCES standard • TO2 GrAF standard: The Graph Annotation Format (Ide and Sudermam, 2007) is the XML serialization of LAF (ISO 24612, 2009) • LMF for lexical resources • CONLL for parsers • Converters and adapted WS outputs

  21. Format Converters 31 Format converters on the PANACEA Registry • Freeling to TO. CNR http://registry.elda.org/services/207 • KAF to TO. CNR http://registry.elda.org/services/208 • Basic Xces to txt. CNR http://registry.elda.org/services/209 • PoS tag. (Freelingtreetagger) to GrAF. UPF http://registry.elda.org/services/142 • Dependency parsing (Freeling) to GrAF. UPF http://registry.elda.org/services/197 • Dependency CoNLL to GrAF. CNR http://registry.elda.org/services/254 • Word doc to txt. UPF http://registry.elda.org/services/112 • In-house mwe to LMF. CNR http://registry.elda.org/services/296 • Pdf to text. UPF http://registry.elda.org/services/116 • Multi. encodings converter (ISO, UTF, etc.). UPF http://registry.elda.org/services/114 • Aligner to TO. DCU http://registry.elda.org/services/69 • Sentence alignment to TMX. DCU http://registry.elda.org/services/219 • Treetagger to MOSES. DCU http://registry.elda.org/services/275 • UIMA to GrAF. ILSP http://registry.elda.org/services/182 • METASHAREmetadata generators http://myexperiment.elda.org/workflows/96

  22. 3rd party tools integration • PANACEA WS wrapper (Soaplab) and the CI make it easy for WS Providers to integrate 3rd party tools. • ILSP tools are UIMA tools UIMA • Freeling UPC • Treetagger University of Stuttgart • Twitter NLP Carnegie Mellon University • MALT Parser Uppsala University • DeSRUniversitàdi Pisa • MOSES / Giza++ • DELiC4MT (MT evaluation) DCU • Berckeley tagger, parser, aligner Berkeley University California

  23. Web ServicesScalability • Web services are being deployed using Soaplab 2.3.2: • Service providers only need to use metadata (ACD) files  Usability • Web client application to test WSs: Spinet  Usability • PANACEA developers have been in contact with Soaplab developers  Collaboration • SOAP protocol standard  Interoperability • WS can be called from Taverna or other workflow editors • WS can be called with many programming languages: Python, Perl, Ruby, Java, etc. • Soaplab polling to avoid client timeouts  Scalability • PANACEA Improvements  Scalability • Parallel request limit system  • SOAP messaging optimization 

  24. Workflows design optimization: Robustness • Building workflows with Taverna • Version 2.4.2  Scalability • Polling (Soaplab)  Scalability • long lasting web service calls without timeouts • Retries  Scalability • Parallelization  Scalability • Tutorials and videos  Usability

  25. Machine Resources: handling parallel requests Parallelization level 3 (3 parallel request per service * 2 services = 6 concurrent requests)

  26. Machine Resources: handling parallel requests Parallelization level 10 (10 parallel request per service * 2 services = 20 concurrent requests)

  27. Machine Resources: handling parallel requests • From 1x to 10x experiment http://ws02.iula.upf.edu/panacea/examples/videos/Panacea_parallelization_scalability_v01.mp4 • Two Taverna instances running at the same time • 100 documents to be processed • 1 workflow with NO parallelization / the other with 10x • The same server: ws04 with 8GB RAM and 4 CPUs • More resources > more parallel requests

  28. Machine Resources: handling parallel requests • Conclusions: • PANACEA fulfils large data scalabilty goal  Scalability • Requirements: • Robust WS deployment: Soaplab (with Panacea improvements) or other robust framewoks. • Taverna 2.4 • Workflow design must follow the PANACEA massive data tutorial (retries, polling, etc) • The architecture is highly scalable: growth is just a matter of resources  • EMBL –EBI (European Bioinformatics Institute in Cambridge): • 200 Servers • 2000 cores • Server requests balancing • Software, etc. • More than 50000 FreelingWS parallel requests  • Typical Panacea server: • 2 - 4 cores • 4 - 8 GB RAM • 30 - 100 GB HDD • 100 Freeling WS parallel requests  Statistics

  29. WP7 Evaluation

  30. Conclusions • Functional platform  • Web services software  • Registry / myExperiment • Usability for users and providers  • Interoperability: • Data formats  • Common Interfaces  • Tutorials and Documentation  • Scalability

  31. The future • Authentication Web Services  Business opportunity • Institutions and companies can sell their services and/or machine resources • Automatically build workflows  Usability and interoperability • Based on input data and user desired output, etc. • Data Visualization tools / Widgets  Usability • Improve total throughput  Scalability • With more machine resources we can achieve faster experiment results • Software optimization: task splitting and parallelization • Publications with experiments  Research • Researchers could link their publications to real experiments (WS, workflows, data. etc.) • Fostering research making experiments easily replicable • Improved experiments: more data, more machine resources, faster results, etc.

  32. Thankyou Questions?

More Related