270 likes | 376 Vues
This presentation by Simon Lucas from Essex University explores the evolution of algorithm evaluation systems for Java submissions, tracing developments from centralized evaluation (Version 1) to distributed systems using Java RMI (Version 2) and eventually XML over HTTP (Version 3). It discusses competitions, algorithm performance metrics, and the importance of scalable systems that allow researchers to evaluate their algorithms in various languages while maintaining security and performance integrity. The future direction emphasizes overcoming existing limitations with an XML-based approach.
E N D
Algoval: Evaluation ServerPast, Present and Future Simon Lucas Computer Science Dept Essex University 25 January, 2002
Architecture Evolution • Version 1: Centralised evaluation of Java submissions (Spring 2000) • Version 2: Distributed evaluation using Java RMI (Summer 2001) • Version 3: Distributed evaluation using XML over HTTP (Spring 2002)
Competitions • Post-Office Sponsored OCR Competition (Autumn 2000) • IEEE Congress on Evolutionary Computation 2001 • IEEE WCCI 2002 • ICDAR 2003 • Wide range of contests – OCR, Sequence Recognition, Object Recognition
Parameterised Algorithms • Note that league table entries can include the parameters that were used to configure the algorithm • This allows developers to observe the results of different parameter settings on the performance measures • E.g.: problems.seqrec.SNTupleRecognizer?n=4&gap=11?eps=0.01
Centralised • System restricted submissions to be written in Java – for security reasons • Java programs can be run in within a highly restrictive security manager • Does not scale well under heavy load • Many researchers unwilling to convert their algorithm implementations to Java
Centralised II • Can measure every aspect of an algorithms performance • Speed • Memory requirements (static, dynamic) • All algorithms compete on a level playing field • Very difficult for an algorithm to cheat
Distributed • Researchers can test their algorithms against others without submitting their code • Results on new datasets can be generated immediately for all clients that are connected to the evaluation server • Results are generated by the same evaluation method. • Hence meaningful comparisons can be made between different algorithms.
Distributed (RMI) • Based on Java’s Remote Method Invocation (RMI) • Works okay, but client programs still need to access a Java Virtual Machine • BUT: the algorithms can now be implemented in any language • However: there may still be some work converting the Java data structures to the native language
Distributed II • Since most computation is done on the clients' machines, it scales well. • Researchers can implement their algorithms in any language they choose - it just has to talk to the evaluation proxy on their machine. • When submitting an algorithm it is also possible to specify URLs for the author and the algorithm • Visitors to the web-site can view league tables then follow links to the algorithm and its implementer.
Remote Participation • Developers download a kit • Interface their algorithm to the spec. • Run a command-line batch file to invoke their algorithm on a specified problem
Features of RMI • Handles Object Serialization • Hence: problem specifications can easily include complex data structures • Fragile! – changes to the Java classes may require developers to download a new developer kit • Does not work well through firewalls • HTTP Tunnelling can solve some problems, but has limitations (e.g. no callbacks)
<future>XML Version</future> • While Java RMI is platform independent (any platform with a JVM), XML is language independent • XML version is HTTP based • No known problems with firewalls
XML Version • Each client (algorithm under test) • parses XML objects (e.g. datasets) • sends back XML objects (e.g. pattern classifications) to the server
Pattern recognition servers • Reside at particular URLs • Can be trained on specified or supplied datasets • Can respond to recognition requests
Example Request • Recognize this word: • Given the dictionary at: • http://ace.essex.ac.uk/viadocs/dic/pygenera.txt • And the OCR training set at: • http://ace.essex.ac.uk/algoval/ocr/viadocs1.xml • Respond with your 10 best word hypotheses
1. MELISSOBLAPTES2. ENDOMMMASIS3. HETEROGRAPHIS4. TRICHOBAPTES5. HETEROCHROSIS6. PHLOEOGRAPTIS7. HETEROCNEPHES8. DRESCOMPOSIS9. MESOGRAPHE10.DIPSOCHARES Example Response
Issues • How general to make problem specs • Could set up separate problems for OCR and face recognition, or a single problem called ImageRecognition • How does the software effort scale?
Software Scalability • Suppose we have: • A algorithms implemented in L languages • D datasets • P problems • E algorithm evaluators • How will our software effort scale with respect to these numbers?
Scalability (contd.) • Consider server and clients • More effort at the server can mean less effort for clients • For example, language specific interfaces and wrappers can be defined • This makes participation in a particular language much less effort • This could be done on demand
Summary • Independent, automatic algorithm evaluation • Makes sound scientific and economic sense • Existing system works but has some limitations • Future XML-based system will overcome these • Then need to get people using this • Future contests will help • Industry support will benefit both academic research and commercial exploitation