1 / 27

Algoval: Evaluation Server Past, Present and Future

Algoval: Evaluation Server Past, Present and Future. Simon Lucas Computer Science Dept Essex University 25 January, 2002. Architecture Evolution. Version 1: Centralised evaluation of Java submissions (Spring 2000) Version 2: Distributed evaluation using Java RMI (Summer 2001)

tasya
Télécharger la présentation

Algoval: Evaluation Server Past, Present and Future

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algoval: Evaluation ServerPast, Present and Future Simon Lucas Computer Science Dept Essex University 25 January, 2002

  2. Architecture Evolution • Version 1: Centralised evaluation of Java submissions (Spring 2000) • Version 2: Distributed evaluation using Java RMI (Summer 2001) • Version 3: Distributed evaluation using XML over HTTP (Spring 2002)

  3. Competitions • Post-Office Sponsored OCR Competition (Autumn 2000) • IEEE Congress on Evolutionary Computation 2001 • IEEE WCCI 2002 • ICDAR 2003 • Wide range of contests – OCR, Sequence Recognition, Object Recognition

  4. Sample Results

  5. Statistics

  6. Details

  7. More Details

  8. Parameterised Algorithms • Note that league table entries can include the parameters that were used to configure the algorithm • This allows developers to observe the results of different parameter settings on the performance measures • E.g.: problems.seqrec.SNTupleRecognizer?n=4&gap=11?eps=0.01

  9. Centralised • System restricted submissions to be written in Java – for security reasons • Java programs can be run in within a highly restrictive security manager • Does not scale well under heavy load • Many researchers unwilling to convert their algorithm implementations to Java

  10. Centralised II • Can measure every aspect of an algorithms performance • Speed • Memory requirements (static, dynamic) • All algorithms compete on a level playing field • Very difficult for an algorithm to cheat

  11. Distributed • Researchers can test their algorithms against others without submitting their code • Results on new datasets can be generated immediately for all clients that are connected to the evaluation server • Results are generated by the same evaluation method.  • Hence meaningful comparisons can be made between different algorithms.

  12. Distributed (RMI) • Based on Java’s Remote Method Invocation (RMI) • Works okay, but client programs still need to access a Java Virtual Machine • BUT: the algorithms can now be implemented in any language • However: there may still be some work converting the Java data structures to the native language

  13. Distributed II • Since most computation is done on the clients' machines, it scales well. • Researchers can implement their algorithms in any language they choose - it just has to talk to the evaluation proxy on their machine. • When submitting an algorithm it is also possible to specify URLs for the author and the algorithm • Visitors to the web-site can view league tables then follow links to the algorithm and its implementer.

  14. Distributed (RMI)

  15. UML Sequence

  16. Remote Participation • Developers download a kit • Interface their algorithm to the spec. • Run a command-line batch file to invoke their algorithm on a specified problem

  17. Features of RMI • Handles Object Serialization • Hence: problem specifications can easily include complex data structures • Fragile! – changes to the Java classes may require developers to download a new developer kit • Does not work well through firewalls • HTTP Tunnelling can solve some problems, but has limitations (e.g. no callbacks)

  18. <future>XML Version</future> • While Java RMI is platform independent (any platform with a JVM), XML is language independent • XML version is HTTP based • No known problems with firewalls

  19. XML Version • Each client (algorithm under test) • parses XML objects (e.g. datasets) • sends back XML objects (e.g. pattern classifications) to the server

  20. Pattern recognition servers • Reside at particular URLs • Can be trained on specified or supplied datasets • Can respond to recognition requests

  21. Example Request • Recognize this word: • Given the dictionary at: • http://ace.essex.ac.uk/viadocs/dic/pygenera.txt • And the OCR training set at: • http://ace.essex.ac.uk/algoval/ocr/viadocs1.xml • Respond with your 10 best word hypotheses

  22. 1. MELISSOBLAPTES2. ENDOMMMASIS3. HETEROGRAPHIS4. TRICHOBAPTES5. HETEROCHROSIS6. PHLOEOGRAPTIS7. HETEROCNEPHES8. DRESCOMPOSIS9. MESOGRAPHE10.DIPSOCHARES Example Response

  23. Issues • How general to make problem specs • Could set up separate problems for OCR and face recognition, or a single problem called ImageRecognition • How does the software effort scale?

  24. Software Scalability • Suppose we have: • A algorithms implemented in L languages • D datasets • P problems • E algorithm evaluators • How will our software effort scale with respect to these numbers?

  25. Scalability (contd.) • Consider server and clients • More effort at the server can mean less effort for clients • For example, language specific interfaces and wrappers can be defined • This makes participation in a particular language much less effort • This could be done on demand

  26. Summary • Independent, automatic algorithm evaluation • Makes sound scientific and economic sense • Existing system works but has some limitations • Future XML-based system will overcome these • Then need to get people using this • Future contests will help • Industry support will benefit both academic research and commercial exploitation

More Related