AutoEval and Missplel: Two Generic Tools for Automatic Evaluation

AutoEval and Missplel:Two Generic Tools for Automatic Evaluation Johnny Bigert, Linus Ericson, Anton Solis Nada, KTH, Stockholm, Sweden Contact: johnny@kth.se www.nada.kth.se/theory/humanlang/tools.html

Manual evaluation • Time-consuming, tedious, error-prone • Computers are good at repetitive tasks, humans are not • Unavoidable in some situations

Automatic evaluation • Cheap, fast, accurate, easily reproducible • Incorporated in the development of most NLP system

Automatic evaluation • AutoEval: simplifies the construction of (NLP system) evaluation • Missplel: introduces human-like errors into text

AutoEval • "I write evaluation code myself in all our NLP projects" • "Why would I need AutoEval?"

AutoEval • Our point exactly Repetition of: • Input and output file handling • XML parsing and XML output • Error handling, malformed input • Data storage, management and processing

AutoEval Features — avoids repetition: • Handles input (XML/structured plain-text) and generates output (XML) • Handles data storage and processing ...and also: • Generic and extendible script language • Efficient

AutoEval Script language: • Simple C-like syntax • Powerful • Modules and macros in repository files • Extendible, add your own functions

AutoEval Example of configuration and script language: <root> <files> <file format="plain" type="in" name="datafile">TnT.wt</file> <file format="xml" type="out" name="outfile">out.xml</file> </files> <process> field(file("datafile"), "\t", "\n", var("word"), var("tag")); inc(cnt("tot")); inc(cnt(lookup("tag"))); </process> <processonce> outputintcon(out("outfile"), cntmap("global"), "global"); </processonce> </root>

AutoEval The result: <evaloutput date="Mon May 26 12:37:39 2003"> <global> <var name="tot">14119</var> <var name="ab">714</var> <var name="ab.kom">44</var> <var name="ab.pos">149</var> <var name="ab.suv">24</var> ... <var name="vb.sup.akt">117</var> <var name="vb.sup.sfo">35</var> </global>

Missplel • Missplel is a highly configurable tool to introduce human-like spelling errors • Language, PoS tag set, character set and keyboard layout independent • All you need is a word/tag/lemma dictionary

Missplel Performance errors – Damerau: • Keyboard mistypes (Damerau, 1964):Insertion, deletion, substitution, transposition of letters • wellcvome, wellcme, wellcpme, wellcmoe • Result: • a new existing/non-existing word • word class (PoS tag) change or not

Missplel Competence errors – split compounds: • May alter the semantics of a sentence • Kycklinglever – chicken liver • Kyckling lever – chicken is alive • Settings of split compound elements: Minimum length? Allowed PoS tag? Found in dictionary? Word class change? etc.

Missplel Competence errors – sound errors: • Letter level • e.g. sound-alike errors • Regular expression rules: (.+)ei(.+) @1ie@2 receive recieve

Missplel Competence errors – syntax errors: • Word/letter level • Form new words from PoS tags,missing/doubled words etc. • Regular expression rules:<rule ex="slutat skrika - slutat skrikit"> <match>vb\.sup\.akt(.*) vb\.inf.*</match> <to>vb.sup.akt@1 vb.sup.akt</to> </rule>

Missplel Letters NN2 would VM0 be VBI welcome AJ0-NN1 Litters NN2damerau/wordexist-notagchange would VM0ok bee NN1sound/wordexist-tagchange welcmoe ERRdamerau/nowordexist-tagchange

Missplel <input> <filename>TnT.wt</filename> <expression>([^\t]+)\t([^\t]+)([^\r\n]*).*</expression> </input> <output> <filename>output.wte</filename>  <format>%1% %2% %5%</format> <description> <noError>ok</noError> <existingWord>exist</existingWord> <nonExistingWord>noexist</nonExistingWord> <wordChange>-wordch</wordChange> <noWordChange>-nowordch</noWordChange> <tagChange>-tagch</tagChange> <noTagChange>-notagch</noTagChange> </description> </output> ...

Missplel ... <options> <unknownTag>unknown</unknownTag> <unknownLemma>unknownLemma</unknownLemma> <escapeChar>@</escapeChar> <spaceChar> </spaceChar> <wordChar>'</wordChar> <sentenceSeparatorTag>mad</sentenceSeparatorTag> <maxErrorsInSentence>30</maxErrorsInSentence> <configDir>felstava/conf/</configDir> </options> <wordlist> <create> <filename>Swedish.cwtl</filename> <expression>.+\t([^\t]+)\t([^\t]+)\t+([^\t]+)</expression> </create> <wordfile>outfile.gz</wordfile> <tagfile>tagfile</tagfile> </wordlist> ...

Missplel ... <damerau> <reportName>damerau</reportName> <active>yes</active> <probability>10.0</probability> <confusionMatrix>confusionfile</confusionMatrix> <subst>1</subst> <ins>1</ins> <del>1</del> <transp>1</transp> <allowExistingWords>no</allowExistingWords> <forceAllowWords>no</forceAllowWords> <allowTagChange>yes</allowTagChange> <forceAllowTag>no</forceAllowTag> </damerau> ...

Missplel ... <splitCompound> <reportName>split</reportName> <active>no</active> <probability>99.0</probability> <splitUnknownWords>yes</splitUnknownWords> <splitThreshold>50</splitThreshold> <minWordLength>6</minWordLength> <minSplitWordLength>3</minSplitWordLength> <factors> <wordLength>1</wordLength> <inDictionaryFirst>10</inDictionaryFirst> <inDictionarySecond>10</inDictionarySecond> <tagAllowed>10</tagAllowed> <tagMatchFirst>0</tagMatchFirst> <tagMatchSecond>15</tagMatchSecond> </factors> </splitCompound> ...

Missplel ... <soundError> <reportName>sound</reportName> <active>no</active> <filename>sound.test</filename> <probability>100.0</probability> <expression>(.+)\t(.+)\t(.+)</expression> <allowExistingWords>yes</allowExistingWords> <forceAllowWords>no</forceAllowWords> <allowTagChange>yes</allowTagChange> <forceAllowTag>no</forceAllowTag> </soundError> ...

Missplel ... <syntaxError> <reportName>introduced</reportName> <active>no</active> <filename>error.rules</filename> <probability>100.0</probability> <allowExistingWords>yes</allowExistingWords> <forceAllowWords>no</forceAllowWords> <allowTagChange>yes</allowTagChange> <forceAllowTag>no</forceAllowTag> </syntaxError>

Applications • AutoEval has been used to evaluate • Parsers • PoS taggers • PoS majority/ensemble tagging • Missplel has been used to evaluate • Spell checkers • Grammar checkers • Robustness of parsers and taggers

Licence • AutoEval and Missplel are open source under the Gnu General Public Licence • Source code available at www.nada.kth.se/theory/ humanlang/tools.html

AutoEval and Missplel: Two Generic Tools for Automatic Evaluation