240 likes | 370 Vues
AutoEval and Missplel are innovative tools designed to streamline the automatic evaluation of NLP systems. AutoEval simplifies the construction of evaluation frameworks, enabling quick and accurate assessments with minimal manual intervention. Missplel, on the other hand, introduces human-like errors into text, allowing developers to test robustness against common spelling mistakes. Together, these tools reduce the time and effort of manual evaluation, making them invaluable resources in the development and testing of natural language processing applications.
E N D
AutoEval and Missplel:Two Generic Tools for Automatic Evaluation Johnny Bigert, Linus Ericson, Anton Solis Nada, KTH, Stockholm, Sweden Contact: johnny@kth.se www.nada.kth.se/theory/humanlang/tools.html
Manual evaluation • Time-consuming, tedious, error-prone • Computers are good at repetitive tasks, humans are not • Unavoidable in some situations
Automatic evaluation • Cheap, fast, accurate, easily reproducible • Incorporated in the development of most NLP system
Automatic evaluation • AutoEval: simplifies the construction of (NLP system) evaluation • Missplel: introduces human-like errors into text
AutoEval • "I write evaluation code myself in all our NLP projects" • "Why would I need AutoEval?"
AutoEval • Our point exactly Repetition of: • Input and output file handling • XML parsing and XML output • Error handling, malformed input • Data storage, management and processing
AutoEval Features — avoids repetition: • Handles input (XML/structured plain-text) and generates output (XML) • Handles data storage and processing ...and also: • Generic and extendible script language • Efficient
AutoEval Script language: • Simple C-like syntax • Powerful • Modules and macros in repository files • Extendible, add your own functions
AutoEval Example of configuration and script language: <root> <files> <file format="plain" type="in" name="datafile">TnT.wt</file> <file format="xml" type="out" name="outfile">out.xml</file> </files> <process> field(file("datafile"), "\t", "\n", var("word"), var("tag")); inc(cnt("tot")); inc(cnt(lookup("tag"))); </process> <processonce> outputintcon(out("outfile"), cntmap("global"), "global"); </processonce> </root>
AutoEval The result: <evaloutput date="Mon May 26 12:37:39 2003"> <global> <var name="tot">14119</var> <var name="ab">714</var> <var name="ab.kom">44</var> <var name="ab.pos">149</var> <var name="ab.suv">24</var> ... <var name="vb.sup.akt">117</var> <var name="vb.sup.sfo">35</var> </global>
Missplel • Missplel is a highly configurable tool to introduce human-like spelling errors • Language, PoS tag set, character set and keyboard layout independent • All you need is a word/tag/lemma dictionary
Missplel Performance errors – Damerau: • Keyboard mistypes (Damerau, 1964):Insertion, deletion, substitution, transposition of letters • wellcvome, wellcme, wellcpme, wellcmoe • Result: • a new existing/non-existing word • word class (PoS tag) change or not
Missplel Competence errors – split compounds: • May alter the semantics of a sentence • Kycklinglever – chicken liver • Kyckling lever – chicken is alive • Settings of split compound elements: Minimum length? Allowed PoS tag? Found in dictionary? Word class change? etc.
Missplel Competence errors – sound errors: • Letter level • e.g. sound-alike errors • Regular expression rules: (.+)ei(.+) @1ie@2 receive recieve
Missplel Competence errors – syntax errors: • Word/letter level • Form new words from PoS tags,missing/doubled words etc. • Regular expression rules:<rule ex="slutat skrika - slutat skrikit"> <match>vb\.sup\.akt(.*) vb\.inf.*</match> <to>vb.sup.akt@1 vb.sup.akt</to> </rule>
Missplel Letters NN2 would VM0 be VBI welcome AJ0-NN1 Litters NN2damerau/wordexist-notagchange would VM0ok bee NN1sound/wordexist-tagchange welcmoe ERRdamerau/nowordexist-tagchange
Missplel <input> <filename>TnT.wt</filename> <expression>([^\t]+)\t([^\t]+)([^\r\n]*).*</expression> </input> <output> <filename>output.wte</filename> <!-- %1% Word, %2% Tag, %3% Lemma, %4% Rest of line, %5% Error descr --> <format>%1% %2% %5%</format> <description> <noError>ok</noError> <existingWord>exist</existingWord> <nonExistingWord>noexist</nonExistingWord> <wordChange>-wordch</wordChange> <noWordChange>-nowordch</noWordChange> <tagChange>-tagch</tagChange> <noTagChange>-notagch</noTagChange> </description> </output> ...
Missplel ... <options> <unknownTag>unknown</unknownTag> <unknownLemma>unknownLemma</unknownLemma> <escapeChar>@</escapeChar> <spaceChar> </spaceChar> <wordChar>'</wordChar> <sentenceSeparatorTag>mad</sentenceSeparatorTag> <maxErrorsInSentence>30</maxErrorsInSentence> <configDir>felstava/conf/</configDir> </options> <wordlist> <create> <filename>Swedish.cwtl</filename> <expression>.+\t([^\t]+)\t([^\t]+)\t+([^\t]+)</expression> </create> <wordfile>outfile.gz</wordfile> <tagfile>tagfile</tagfile> </wordlist> ...
Missplel ... <damerau> <reportName>damerau</reportName> <active>yes</active> <probability>10.0</probability> <confusionMatrix>confusionfile</confusionMatrix> <subst>1</subst> <ins>1</ins> <del>1</del> <transp>1</transp> <allowExistingWords>no</allowExistingWords> <forceAllowWords>no</forceAllowWords> <allowTagChange>yes</allowTagChange> <forceAllowTag>no</forceAllowTag> </damerau> ...
Missplel ... <splitCompound> <reportName>split</reportName> <active>no</active> <probability>99.0</probability> <splitUnknownWords>yes</splitUnknownWords> <splitThreshold>50</splitThreshold> <minWordLength>6</minWordLength> <minSplitWordLength>3</minSplitWordLength> <factors> <wordLength>1</wordLength> <inDictionaryFirst>10</inDictionaryFirst> <inDictionarySecond>10</inDictionarySecond> <tagAllowed>10</tagAllowed> <tagMatchFirst>0</tagMatchFirst> <tagMatchSecond>15</tagMatchSecond> </factors> </splitCompound> ...
Missplel ... <soundError> <reportName>sound</reportName> <active>no</active> <filename>sound.test</filename> <probability>100.0</probability> <expression>(.+)\t(.+)\t(.+)</expression> <allowExistingWords>yes</allowExistingWords> <forceAllowWords>no</forceAllowWords> <allowTagChange>yes</allowTagChange> <forceAllowTag>no</forceAllowTag> </soundError> ...
Missplel ... <syntaxError> <reportName>introduced</reportName> <active>no</active> <filename>error.rules</filename> <probability>100.0</probability> <allowExistingWords>yes</allowExistingWords> <forceAllowWords>no</forceAllowWords> <allowTagChange>yes</allowTagChange> <forceAllowTag>no</forceAllowTag> </syntaxError>
Applications • AutoEval has been used to evaluate • Parsers • PoS taggers • PoS majority/ensemble tagging • Missplel has been used to evaluate • Spell checkers • Grammar checkers • Robustness of parsers and taggers
Licence • AutoEval and Missplel are open source under the Gnu General Public Licence • Source code available at www.nada.kth.se/theory/ humanlang/tools.html