PPT - Analyzing Learner Corpora: Error Annotation and Integration into CALL Programs PowerPoint Presentation

Learner corpus analysis and error annotation Xiaofei Lu CALPER 2010 Summer Workshop July 13, 2010

Overview • Analyzing raw corpora • Error annotation • Issues in corpus annotation • Granger (2003)

Analyzing raw corpora • Concordancing software • GOLD • AntConc • Other software • CLAN

Issues in corpus annotation • Annotation scheme and format • Annotation procedure • Annotation quality

Annotation scheme and format • What are the categories you are using? • Linguistically consensual • Overspecification vs. underspecification • Use short, meaningful codes for your categories • Annotation format considerations • Compatible with annotation scheme • Facilitates corpus query

Annotation procedure and quality • Annotator training • Scheme and format • Problematic cases and disagreements • Computer-assisted manual annotation • Stanford annotation tool • UAM Corpus Tool and NoteTab • Inter-annotator agreement • Cohen’s Kappa • Online Kappa calculator

Granger (2003) • Learner corpora • Error annotation • Error statistics and analysis • Integration of results into CALL • Conclusion

Learner corpora • What is a learner corpus? • Difference from traditional data in SLA • Difference from native language data • Frequencies • Errors • From error annotation to error detection

Computer-aided error annotation • Dagneaux, Denness and Granger (1998) • Manual correction of L2 French corpus • Elaboration of an error tagging system • Insertion of error tags and corrections • Retrieval of lists of error types and statistics • Concordance-based error analysis • Tagging system • Informative but manageable • Reusable, flexible, consistent

Error tagging system • Dulay, Burt & Krashen (1982) • System based on linguistic categories (e.g., syntax) • Surface structure alternations (e.g., omission) • Granger’s (2003) three-dimensional taxonomy • Error domain • Error category • Word category

Error tagging system (cont.) • Error domain and category • General level: grammatical, lexical, etc. • Domains subdivided into error categories • Table 1, page 468 • Word category • A POS tagset with 11 major and 54 sub-categories • Makes it possible to sort errors by POS categories

Error tagging system (cont.) • Correct forms inserted next to erroneous forms • Facilitates interpretation of error annotations • Allows for automatic sorting on correct forms • Tag insertion using a menu-driven editor

Error statistics and analysis • Error frequency by domain or (word) category • Highest ranked domains: grammar and form • Error trigrams • Concordancers for searching error codes • AntConc • WordSmith Tools

Integrating results into CALL • Goal: a hypermedia CALL program • Using NLP and Communicative approaches to SLA • Traditional and NLP-enabled exercises • Automatic error diagnosis and feedback generation • Error statistics and analysis used to • Select linguistic areas to focus on • Adapt exercises as a function of attested error types • Adapt NLP tools for error diagnosis

Integrating results into CALL (cont.) • Most error-prone linguistic areas • Tense and mood, agreement • Articles, complementation, prepositions • Adapting exercises • Exercises reflect type of error-prone context • Formal errors through dictation and exercises targeting specific difficulties • Attention to punctuation

Integrating results into CALL (cont.) • Adapting NLP tools for error diagnosis • Spell checker and parser • Handles orthographic, grammatical, syntactic, and lexical errors • Not punctuation, semantic, and tense errors

Granger (2003) summary • Effective 3-tier error annotation system • Limited number of categories per tier • Versatile automated data manipulation • Limitations of error-tagging • Element of subjectivity in annotation • Focuses on misuse • Usefulness of error-tagged learner corpus • Error statistics helps understand learner interlang • Helps adapt pedagogical materials and programs

Activity • Using the Stanford annotation tool • Annotate a short text using your own scheme, or • Annotate a short learner text using Granger’s (2003) scheme • Query the annotated text using AntConc

Analyzing Learner Corpora: Error Annotation and Integration into CALL Programs

Presentation Transcript

The learner as corpus designer

Genome analysis and annotation

Tools for Ontology-based Corpus Annotation

Semantic annotation of a dialog corpus

CORPORA &amp; CORPUS ANNOTATION

Ecuadorian English Learner Corpus

CORpus analysis

Accelerating Corpus Annotation through Active Learning

Linguistic annotation of learner corpora

CORPUS ANNOTATION

Corpus annotation and analysis using Praat

CALL – A Corpus-based Course in Contrastive Analysis and Learner Language.

Compiling and Analyzing Your Own Learner Corpus

Corpus annotation and retrieval: an introduction

ACM email corpus annotation analysis

Corpus annotation

Compiling and Analyzing Your Own Learner Corpus

Genome analysis and annotation

Corpus Annotation II

Timeframes and Corpus Analysis

Sea Ice

Sea Ice

Analyzing Learner Corpora: Error Annotation and Integration into CALL Programs

Presentation Transcript

The learner as corpus designer

Genome analysis and annotation

Tools for Ontology-based Corpus Annotation

Semantic annotation of a dialog corpus

CORPORA &amp;amp; CORPUS ANNOTATION

Ecuadorian English Learner Corpus

CORpus analysis

Accelerating Corpus Annotation through Active Learning

Linguistic annotation of learner corpora

CORPUS ANNOTATION

Corpus annotation and analysis using Praat

CALL – A Corpus-based Course in Contrastive Analysis and Learner Language.

Compiling and Analyzing Your Own Learner Corpus

Corpus annotation and retrieval: an introduction

ACM email corpus annotation analysis

Corpus annotation

Compiling and Analyzing Your Own Learner Corpus

Genome analysis and annotation

Corpus Annotation II

Timeframes and Corpus Analysis

Sea Ice

Sea Ice

CORPORA & CORPUS ANNOTATION