260 likes | 438 Vues
The Dictionary of Italian Collocations : Design and Integration in an Online Learning Environment. Stefania Spina University for Foreigners Perugia, Italia. The Dictionary of Italian Collocations. Part of APRIL project (“ Personalised web environment for language learning ”)
E N D
The DictionaryofItalianCollocations: Design and Integration in an Online LearningEnvironment Stefania Spina UniversityforForeigners Perugia, Italia
The Dictionary of Italian Collocations • Part of APRIL project (“Personalised web environmentforlanguagelearning”) • NLP resourcesas a supportfor the lexicalcompetenceofstudentsofItalianwithin a VirtualLearningEnvironment(VLE). LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Presentationoutline • background and motivation • reference corpus • methodology • dictionary compilation • integrationwithin VLE LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Background • Complexityof MWU: • differentsyntactic and semanticprofiles • prototypicalfeatures: • semantic (non-)compositionality • (non-)substitutabilityofcomponentsbysemanticallysimilarwords • (non-)insertionofexternalitems • continuum ratherthan definite categories LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Motivation: collocations in SLA • improvelearnersfluency • examplesfromItalianleanercorpora • preoccupata per l’esame vado a prendere una doccia (Vietnam) • Fare la doccia “take a shower” • ho dimenticato la macchina di fotografia (China) • Macchina fotografica “camera” • non-nativespeakers and L2 vocabulary: first single words, then more extendedchunks • trend tooveruse the creative combinationofisolatedwords • Sinclair’s open choiceprinciple LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
DICI • collocationsrequirespecificpedagogicalattention • DictionaryofItalianCollocations(DICI) • itiscorpus-based; • itis a learner-orientedtool: listof the most common Italiancollocations, classified on a frequencybasis; • itisalsobased on statisticalmethodologies (dispersion in the differenttextualgenresrepresented in the corpus). LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Reference corpus • Perugia corpus: POS-tagged, lemmatized LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Extractionbased on POS sequences • Analysisofexistinglistofcollocations: • 150 different POS sequences • 10 mostproductive (75%) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Experimentalmethodology: 4steps • extractionof candidate collocationsfrom corpus; • filteringof the candidate collocations: frequency; • filteringof the candidate collocations: dispersion; • filteringof the candidate collocations: manual • 6POS sequences • 12-million-word sample • 4 corpus sections LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Collocationsextraction + frequency • IMS Corpus Workbench • removingall the candidateswithfrequency = 1 • 41643 collocations LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Dispersion • Examples: • Aggrottare la fronte “tofrown” (fiction) • Vincere le elezioni “towin the elections” (press) • Dare una definizione “togive a definition” (academic prose) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Dispersion • Juilland’sDvalue (Juilland - Chang-Rodriguez, 1964) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Dispertion + frequency • Dvalue: combinedwithfrequency = usage • U = FD • Usage value ≥ 2: 2047 candidate collocations • Manualselection. Finalresult: • listof1553 word combinations = dictionaryentries LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Collocationslist LREC 2010 - Stefania Spina - The DictionaryofItalianCollocations
Compilation of the Dictionary • Lexical database enrichedwithtwokindsof data: • visibleto the learner (client output) • definition, examples, part-of-speech, syntacticcontextofoccurrenceofcollocations • tobeprocessedbyotherapplications (server) • internalsyntacticconfigurationforautomaticrecognition LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
DB integration in the VLE • VirtualLearningEnvironment: • web applicationspecificallydevotedtolanguagelearning • LELE (Linguistically-EnhancedLearningEnvironment) • providelanguagelearnerswithadditional NLP resources, in ordertoimprovetheirlinguisticcompetence • receptive and productivelearningactivitiesconcerning the recognition and the activeuseofcollocations LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
LELE Features • toautomaticallyrecognize and highlightmulti-wordunits in writtenItaliantexts; • to show additionallinguistic information about the selectedcollocations; • to generate collocationtestsforcollocationalcompetenceassessmentofsecondlanguagelearners. • … LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
LELE scheme VLE DB + tagger browser server client LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Conclusions • Nextsteps: • samemethodologyto the whole corpus, forall the 10 selected POS sequences • test of LELE system withstudents: startingjanuary 2011 • Furtherresearch • refinestatisticalmeasures • assigncollocationstodifferentlevelsofcompetence • othertools (productivetasks) LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Stefania Spina E-learning and Language Technologies UniversityforForeigners Perugia, Italy stefania.spina@unistrapg.it http://april.unistrapg.it LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
References • Juilland, A & Chang-Rodriguez, E. (1964). FrequencyDictionaryofSpanishWords. The Hague: Mouton & Co • Meunier, F. & Granger S. (2008). Phraseology in foreignlanguagelearning and teaching. Amsterdam: John Benjamins • Nesselhauf, N. (2005). Collocations in a learner corpus. Amsterdam: John Benjamins • PazosBretaña, M. & PamiesBertrán, A. (2008). Combinedstatistical and grammaticalcriteria. In S. Granger & F. Meunier (Eds), Phraseology. An interdisciplinaryperspective. Amsterdam: John Benjamins, pp. 391-406. LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations
Backgroud: prototypicalfeatures • semantic (non)-compositionality Tagliare la corda “runaway” aprire la porta “open the door” • (non)-substitutability {fare|porre|rivolgere|formulare} una domanda “ask a question” Camera oscura “dark room” * Stanza oscura • (non)-insertionofexternalitems fare una lunga, calda, riposante doccia “take a long, hot, restfulshower” Sistema *molto operativo “operating system” LREC 2010 - Stefania Spina - The Dictionary of Italian Collocations