Algoval: Evaluation Server Past, Present and Future

Algoval: Evaluation ServerPast, Present and Future Simon Lucas Computer Science Dept Essex University 25 January, 2002

Architecture Evolution • Version 1: Centralised evaluation of Java submissions (Spring 2000) • Version 2: Distributed evaluation using Java RMI (Summer 2001) • Version 3: Distributed evaluation using XML over HTTP (Spring 2002)

Competitions • Post-Office Sponsored OCR Competition (Autumn 2000) • IEEE Congress on Evolutionary Computation 2001 • IEEE WCCI 2002 • ICDAR 2003 • Wide range of contests – OCR, Sequence Recognition, Object Recognition

Sample Results

Statistics

Details

More Details

Parameterised Algorithms • Note that league table entries can include the parameters that were used to configure the algorithm • This allows developers to observe the results of different parameter settings on the performance measures • E.g.: problems.seqrec.SNTupleRecognizer?n=4&gap=11?eps=0.01

Centralised • System restricted submissions to be written in Java – for security reasons • Java programs can be run in within a highly restrictive security manager • Does not scale well under heavy load • Many researchers unwilling to convert their algorithm implementations to Java

Centralised II • Can measure every aspect of an algorithms performance • Speed • Memory requirements (static, dynamic) • All algorithms compete on a level playing field • Very difficult for an algorithm to cheat

Distributed • Researchers can test their algorithms against others without submitting their code • Results on new datasets can be generated immediately for all clients that are connected to the evaluation server • Results are generated by the same evaluation method. • Hence meaningful comparisons can be made between different algorithms.

Distributed (RMI) • Based on Java’s Remote Method Invocation (RMI) • Works okay, but client programs still need to access a Java Virtual Machine • BUT: the algorithms can now be implemented in any language • However: there may still be some work converting the Java data structures to the native language

Distributed II • Since most computation is done on the clients' machines, it scales well. • Researchers can implement their algorithms in any language they choose - it just has to talk to the evaluation proxy on their machine. • When submitting an algorithm it is also possible to specify URLs for the author and the algorithm • Visitors to the web-site can view league tables then follow links to the algorithm and its implementer.

Distributed (RMI)

UML Sequence

Remote Participation • Developers download a kit • Interface their algorithm to the spec. • Run a command-line batch file to invoke their algorithm on a specified problem

Features of RMI • Handles Object Serialization • Hence: problem specifications can easily include complex data structures • Fragile! – changes to the Java classes may require developers to download a new developer kit • Does not work well through firewalls • HTTP Tunnelling can solve some problems, but has limitations (e.g. no callbacks)

<future>XML Version</future> • While Java RMI is platform independent (any platform with a JVM), XML is language independent • XML version is HTTP based • No known problems with firewalls

XML Version • Each client (algorithm under test) • parses XML objects (e.g. datasets) • sends back XML objects (e.g. pattern classifications) to the server

Pattern recognition servers • Reside at particular URLs • Can be trained on specified or supplied datasets • Can respond to recognition requests

Example Request • Recognize this word: • Given the dictionary at: • http://ace.essex.ac.uk/viadocs/dic/pygenera.txt • And the OCR training set at: • http://ace.essex.ac.uk/algoval/ocr/viadocs1.xml • Respond with your 10 best word hypotheses

1. MELISSOBLAPTES2. ENDOMMMASIS3. HETEROGRAPHIS4. TRICHOBAPTES5. HETEROCHROSIS6. PHLOEOGRAPTIS7. HETEROCNEPHES8. DRESCOMPOSIS9. MESOGRAPHE10.DIPSOCHARES Example Response

Issues • How general to make problem specs • Could set up separate problems for OCR and face recognition, or a single problem called ImageRecognition • How does the software effort scale?

Software Scalability • Suppose we have: • A algorithms implemented in L languages • D datasets • P problems • E algorithm evaluators • How will our software effort scale with respect to these numbers?

Scalability (contd.) • Consider server and clients • More effort at the server can mean less effort for clients • For example, language specific interfaces and wrappers can be defined • This makes participation in a particular language much less effort • This could be done on demand

Summary • Independent, automatic algorithm evaluation • Makes sound scientific and economic sense • Existing system works but has some limitations • Future XML-based system will overcome these • Then need to get people using this • Future contests will help • Industry support will benefit both academic research and commercial exploitation

Algoval: Evaluation Server Past, Present and Future