1 / 45

Developing annotation solutions for online data-driven learning

Developing annotation solutions for online data-driven learning. Pascual Pérez-Paredes and Jose María Alcaraz SACODEYL Universidad de Murcia, Spain. System Aided Compilation and Open Distribution of European Youth Language. 225836-CP-1-2005-1-ES-MINERVA-M.

jop
Télécharger la présentation

Developing annotation solutions for online data-driven learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Developing annotation solutions for online data-driven learning Pascual Pérez-Paredes and Jose María Alcaraz SACODEYL Universidad de Murcia, Spain EUROCALL 2007 - University of Ulster, 5 - 8 September

  2. System Aided Compilation and Open Distribution of European Youth Language 225836-CP-1-2005-1-ES-MINERVA-M EUROCALL 2007 - University of Ulster, 5 - 8 September

  3. Developing annotation solutions for online data-driven learning • Annotation in CL • Annotating corpora for the FL classroom • Challenges of pedagogical annotation • Developing annotation solutions • SACODEYL annotator Domain analysis Requirements and software specification EUROCALL 2007 - University of Ulster, 5 - 8 September

  4. 1. Annotation in Corpus Linguistics EUROCALL 2007 - University of Ulster, 5 - 8 September

  5. Annotation in Corpus Linguistics • Add-on • Needs of the research community • Annotation = analysis • Annotation = processing EUROCALL 2007 - University of Ulster, 5 - 8 September

  6. Why annotate? Annotation allows corpus users for both refined information retrieval capabilities and the subsequent treatment of the data EUROCALL 2007 - University of Ulster, 5 - 8 September

  7. Annotation • Can be automatic, semi-automatic or manual • Can be performed by one or different annotators or software operators • Does reflect the different nature of the ultimate aim of the meta-information being added to the corpus EUROCALL 2007 - University of Ulster, 5 - 8 September

  8. Non polysemic ambiguity: Poesio and Artstein (2005) ----------- Interest in L2 speakers’ errors: Abe and Tono (2005) EUROCALL 2007 - University of Ulster, 5 - 8 September

  9. Strong research paradigm rooted ongrammatical tagging, including morphological and syntactical information (Garside, R., Leech, G., and McEnery 1997). EUROCALL 2007 - University of Ulster, 5 - 8 September

  10. 2 Annotating corpora for the FL classroom2.1 Corpora in the FL classroom EUROCALL 2007 - University of Ulster, 5 - 8 September

  11. Interest in corpora and FLT: • Volumes: Sinclair 2004, Braun, Kohn and Mukherkee 2006, Hidalgo, Quereda and Santana 2007 • SIG EUROCALL • 1st International Conference on Corpus-Based Approaches to ELT , November 2007 EUROCALL 2007 - University of Ulster, 5 - 8 September

  12. Normalisation is still an issue: • Mauranen (2004:99) points out that for a teaching method to become an important innovation, it has to “make its way to the normal classroom where teachers and students can use it as part of their everyday routine, with not too much extra hassle”. • Chambers 2007: major obstacles • Braun 2007: secondary education EUROCALL 2007 - University of Ulster, 5 - 8 September

  13. 2 Annotating corpora for the FL classroom 2.2 Annotating with a view on learning EUROCALL 2007 - University of Ulster, 5 - 8 September

  14. Braun (2007): pedagogically motivated corpora (a) provide a more systematic range of material than individual texts or scattered collections of activities and, if well-designed, (b) offer a wider range of idiolects than the average material. EUROCALL 2007 - University of Ulster, 5 - 8 September

  15. Braun (2006) states that thematic annotation, including topic keys and section titles, are particularly useful in the implementation of pedagogically motivated corpora. EUROCALL 2007 - University of Ulster, 5 - 8 September

  16. EUROCALL 2007 - University of Ulster, 5 - 8 September

  17. EUROCALL 2007 - University of Ulster, 5 - 8 September

  18. The annotators have a pedagogical use of the text in mind when approaching the annotation stage. • The tags <topic_title>, <topic_key> and <content_key> highlight the relevance of the communicative purpose of texts, that is, the topics and the contents that characterize them. EUROCALL 2007 - University of Ulster, 5 - 8 September

  19. EUROCALL 2007 - University of Ulster, 5 - 8 September

  20. 3 Annotation challenges EUROCALL 2007 - University of Ulster, 5 - 8 September

  21. Rememberthe why annotate? slide Annotation allows corpus users for both refined information retrieval capabilities and the subsequent treatment of the data PEDAGOGY EUROCALL 2007 - University of Ulster, 5 - 8 September

  22. Linguistic analysis of interest in FLT • Tsui (2004) • Corpus-based studies focus on 4 areas of description: • Lexical collocation • Syntactic patterning • Genre analysis • Discourse structure and cohesion Word based and relying on co-occurrence of grammatical word-class tags EUROCALL 2007 - University of Ulster, 5 - 8 September

  23. Linguistic analysis of interest in FLT------>Linguistics comes first------->DDL materialsConcordances and corpus Researcher/Linguist End user EUROCALL 2007 - University of Ulster, 5 - 8 September

  24. Pedagogical analysis (and annotation) of language corpora------>Pedagogy comes first------->Pedagogy-driven DDL Material developer/Teacher/ Learner End user EUROCALL 2007 - University of Ulster, 5 - 8 September

  25. CHALLENGES • Problem-oriented tagging • Corpus applications in FLT still need to gain a status on their own EUROCALL 2007 - University of Ulster, 5 - 8 September

  26. CHALLENGES DESIGN TECHNOLOGY EPISTEMOLOGY EUROCALL 2007 - University of Ulster, 5 - 8 September

  27. DESIGN Leech (1993) maxims • remove the annotation from the text; • if desired, the annotation could be extracted • based on guidelines everyone could reach; • it should be made clear how and by whom the annotation was carried out, • it should be based on widely agreed and theory-neutral principles EUROCALL 2007 - University of Ulster, 5 - 8 September

  28. EPISTEMOLOGY • Presuppositions and foundations: antecedent implications in the literature • Annotation oriented towards pedagogical uses EUROCALL 2007 - University of Ulster, 5 - 8 September

  29. EPISTEMOLOGY • Mukherjee (2006): copora in language pegagogy for (a) dictionaries and material, (b) database and (c) representative samples of learner language. EUROCALL 2007 - University of Ulster, 5 - 8 September

  30. EPISTEMOLOGY • Meunier (2002): methodological influence ---- use of classroom concordancing and inductive approach to learning leading to “rehabilitation” of grammar (p. 135) EUROCALL 2007 - University of Ulster, 5 - 8 September

  31. EPISTEMOLOGY • Bernardini (2000): inductive and deductive learning, probabilistic notion of language and learning pedagogy that resolves the attention to form /meaning dichotomy EUROCALL 2007 - University of Ulster, 5 - 8 September

  32. EPISTEMOLOGY • Bernardini (2000): learners as either researchers or travellers EUROCALL 2007 - University of Ulster, 5 - 8 September

  33. EPISTEMOLOGY • Bernardini (2004): potential of corpora as a linguistic aid: favour descriptive insights and discovery learning EUROCALL 2007 - University of Ulster, 5 - 8 September

  34. EPISTEMOLOGY • Pérez-Paredes (2003,2004): integrative paradigm of CL in FLT EUROCALL 2007 - University of Ulster, 5 - 8 September

  35. TECHNOLOGY • User-friendly: non-computational linguists • Multilingual support • Standard-compliant: reusability and valorisation EUROCALL 2007 - University of Ulster, 5 - 8 September

  36. 4. Developing Annotation Solutions EUROCALL 2007 - University of Ulster, 5 - 8 September

  37. Developing Annotation Solutions From Challenges  To Requirements From software engineering perspective, development can be considered as the following process: From Requirements  To Solutions EUROCALL 2007 - University of Ulster, 5 - 8 September

  38. Input Requirements • Input = User Requirement • Changing Approach = Changing Requirements • Identifying New Requirement • Five Perspectives EUROCALL 2007 - University of Ulster, 5 - 8 September

  39. Actors & Context. Linguistic Engineering vs Pedagogical Engineering Teaching • Pedagogic Tool • Learning Oriented • Friendly • General Domain • Practical • Simplicity • Organizational • Optional Researching • Powerful Tool • Research Oriented • Extensible & Modular • Specific Domain • Efficient • Complexity • Ad-Hoc Solutions • Mandatory EUROCALL 2007 - University of Ulster, 5 - 8 September

  40. Data. Grammatical vs Pedagogical Linguistic Engineering • Large amount of data (representative Corpora) • Grammatical Annotation • Oriented to retrieve statistical Information Learning • Reduced set of data • Pedagogy Annotation • Oriented to retrieve learning information (Hierarchical Structures & Selective Information) EUROCALL 2007 - University of Ulster, 5 - 8 September

  41. Epistemological & Empirical • Multi-Disciplinarily support • Multi-Lingual support • Multi-Corpus Management • Multi-Purpose Support • Based on Standards EUROCALL 2007 - University of Ulster, 5 - 8 September

  42. Choosing Software Life Cycle Spiral Approach Why? EUROCALL 2007 - University of Ulster, 5 - 8 September

  43. 5 SACODEYL Annotator EUROCALL 2007 - University of Ulster, 5 - 8 September

  44. Output. SACODEYL Annotator SACODEYL Annotator characteristics: • Pedagogical Motivation • Teaching Oriented • Friendly Interface • Multi-Language (UTF) • Standardization (TEI) • Multi-Purpose EUROCALL 2007 - University of Ulster, 5 - 8 September

  45. Developing annotation solutions for online data-driven learning Contact information Pascual Pérez-Paredes pascualf@um.es Jose María Alcaraz jmalcaraz@gmail.com Universidad de Murcia, Spain EUROCALL 2007 - University of Ulster, 5 - 8 September

More Related