1 / 47

Margaret Cargill, Michelle Picard and Cally Guerin 25 November 2011

Corpus-based approaches for research students: Using student-made corpora to promote autonomous learning. Margaret Cargill, Michelle Picard and Cally Guerin 25 November 2011. Corpus linguistics approaches. Corpus: a body of text selected for analysis using appropriate software

meg
Télécharger la présentation

Margaret Cargill, Michelle Picard and Cally Guerin 25 November 2011

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Corpus-based approaches for research students: Using student-made corpora to promote autonomous learning Margaret Cargill, Michelle Picard and Cally Guerin 25 November 2011

  2. Corpus linguistics approaches • Corpus: a body of text selected for analysis using appropriate software • The software: a concordancer • Available as web-based applications • e.g. Springer Exemplar www.springerexemplar.com • or stand-alone programs, e.g. • www.adelaide.edu.au/red/adtat/

  3. Concordancing • A concordancer is software that searches a group of texts (a corpus) for all examples of a particular item. • It displays the results as lines of text across the screen for easy comparison • Results can be sorted according to what is on the left or the right of the search item • This can provide data to ‘drive’ language learning and improve written texts

  4. An example of concordancing output to utilise existing available soil water, unlike the perennial gr es (4 g oven dry wt basis) of soil were weighed into 40 ml polypr required 9 kg P/ha, whereas a soil with a high P sorption capacit concentration by 1 mg/kg on a soil with a low P sorption capacity 00, it was expected that this soil would have consistently been t capacity (PBC), which is the soil's capacity to moderate changes and buffering capacity of the soil-an attempt to test Schofield's nisms that are present in the soil-plant microcosm environment. T etermined in a growth-chamber soil-plant microcosm study. Nodding 84) Lime and phosphate in the soil-plant system. Advances in Agro a where crops rely heavily on soil-stored water accrued in summer ns Between Herbicides and the Soil. Academic Press, London, pp. 2 fertility on these particular soils. Although this aberration has over in a range of allophanicsoils amended with 14Clabelled gluc alues for 9 different pasture soils, 6 and 12 months after P fert

  5. Data-driven language learning • Very relevant for discipline-specific English and usage conventions • Empowers users to research their own language issues • Appeals particularly to research students • Hypothesised to contribute to autonomous learning approaches

  6. Making your own corpus • In order to make this tool optimally effective for writing/self-editing, you need to (find or) make a corpus that is specific to the task at hand • Texts need to be in.txt files • ‘Save as’ .txt from .doc, .html or .pdf files • ‘Cleanup’ may be needed after saving

  7. Using a self-made corpus? • A simple concordancer called AdTAT (Adelaide Text Analysis Tool) is available at http://www.adelaide.edu.au/red/adtat/ • Recently developed at the University of Adelaide and made available freely for use • Designed for authors focused on writing texts, not for linguistic researchers

  8. Roundtable structure • Margaret: Self-made and online corpora in an EFL ESP context • Michelle: Use with Turn-it-in to investigate intertextuality • Cally: • Discussion and questions • Uptake options within ALL

  9. Self-made and online corpora in an EFL ESP context • China Academy of Engineering Physics, Mianyang, Sichuan • Consecutive 5-day workshops, 40+ participants (5 to date) • Mixed disciplines within engineering physics • Working researchers, some without higher degrees, English level variable • Strong and increasing pressure to publish in English

  10. Resulting workshop design features • Emphasis required on listening, speaking, reading and writing, but all integrated with publication focus • Day 1, afternoon: ‘Developing discipline-specific English writing skills’ • Re-using language vs plagiarism (sentence templates) • Noun phrases and articles • Identifying vocabulary for learning → concordancing

  11. “Selecting noun phrases to learn • Extending vocabulary is an ongoing need for EAL scientist authors • One effective way to select vocabulary to learn is to use a word frequency list from your own discipline. • Such a word list can be created by using concordancing software to search a collection of discipline-specific texts such as research articles. • These text collections are called corpora (sing. =corpus)”

  12. To make a Frequency list • Open AdTAT • Load a corpus • From top menu bar select Corpus, Word frequency • Resulting screen lists all words in the corpus in order of frequency of use

  13. Needed: a discipline-specific corpus • To demonstrate effectiveness, I organise preparation of a CAEP corpus • Participants are asked to each prepare a single file for homework

  14. “Homework: Building a CAEP corpus • Each participant will prepare one research article for the corpus tonight • Prepared files should be sent as an email attachment to 刘希 <evey324@qq.com> • Include full bibliographic details of the article (the full reference) in the body of the email”

  15. “Preparing text for a corpus • Select articles written by native-speakers of English wherever possible. • Texts in a corpus for concordancing must be stored as plain text files (.txt). • Remove un-needed parts before saving: biodata, keywords, tables, figures, reference list, acknowledgements. • An easy way to do this is to download your selected articles as .html files, remove unwanted parts and ‘save as’ .txt • If you can only get .pdf versions, use ‘Save as text’ option – OR copy desired text one page/ column at a time into a word document, correct spacing, and save as .txt • Label the file with name of journal, first author and year of paper”

  16. Issues re student preparation of texts for corpora • Following directions! • Selection (especially author language status) • Cleaning of text (headers, footers, page numbers/labels, figures/tables, author affiliations, etc. • Provision of reference data for record keeping purposes • Consistent file labelling

  17. My response • Use files as received • Conduct a search every time an issue arises in student drafts that can be addressed this way • When anomalies occur in search outcomes, point out the link to inappropriate corpus preparation • Use Exemplar as a comparison

  18. Demonstration/discussion • I will load into AdTAT the 3 corpora made by CAEP participants in 2010 and 2011 • Together we will run some searches to see • What questions we can answer • What anomalies we may find in the corpus

  19. Types of searches: Collocations • Sort left to find verbs or adjectives that go with a search noun (e.g. reason) or adverbs that go with a search verb (vary) • Sort right to find prepositions that go with a search noun (role) or verb (compare)

  20. Types of searches: Usage conventions • Search for we to see if it is used in the genre of interest • Search for also to see if it appears at the start of a sentence (i.e. with a capital letter – AdTAT is case-sensitive) • bioinformatical or bioinformatic? • No examples of -al in self-made corpus – but strong supervisor preference • Check Exemplar site

  21. Can give writers agency to counter supervisor idiosyncrasy

  22. “Springer Exemplar • Go to www.springerexemplar.com • Choose to search a field or a journal • It is web-based, so no software download • BUT, you cannot • sort output to answer your own questions • see more context than a few words • know if the English is native-speaker or not • These problems are solved with AdTAT”

  23. Uses of the Exemplar site • When your discipline-specific corpus has no or too few examples of a term • When you do not have an appropriate corpus to search • When you want to compare usage more widely than your own small corpus allows but still specific to the discipline • e.g. evolvement (geology); bioinformatical (plant science)

  24. Other web concordancers: Uses • See Virtual Language Centre Web Concordancer at http://www.edict.biz/concordance/WWWConcappE.htm • Allows choice of corpora to search, including the Brown Corpus of US general English • To demonstrate differences between discipline-specific and general English usage • e.g. And at start of sentence, or use of contractions ending in n’t

  25. US general English But a similar AdTAT search of a corpus from the New Phytologist journal (plant science, impact factor over 5) finds 0 examples …

  26. To summarise… • High potential usefulness of both self-made and online corpora, especially where ‘native-speaker’ models are lacking • Labour of constructing a corpus can be seen as a disincentive to use of self-made corpora • Hands-on demonstration to address students’ actual errors can • help overcome this perception, and • provide needed training in search construction

  27. Corpora and Concordancing

  28. You can use the same text as long as youcite We all use the same words The research reversal?

  29. ‘Obligatory intertextuality’ • Document structure intertextuality • Engaging with the literature • Co-authored texts • Discipline-specific language (Eira, 2007)

  30. Unacceptable intertextuality

  31. Researcher Education & Development Adelaide Graduate Centre Explicit instruction • Take focussed notes • Separate the “English” from the “science/ content” • Assemble notes related to topics • “Story” each paragraph in dot-points • Identifying acceptably recyclable text

  32. Researcher Education & Development Adelaide Graduate Centre Intelligently reading reports

  33. Categorising matches

  34. Too original: Hamda For students who are producing unidiomatic awkward, non-standard usage) and/or non-academic English • Step 1 Concordancer • (focussing on collocations, standard phrases) • Step 2 Text-matching to check for originality • Unrelated matches • Standard strings • Sentence templates • Some discipline-specific language

  35. Not original: Weimin For students who are patchwriting with little understanding of referencing and citation conventions • Step 1 Training in the use of bibliographic software • Step 2 Instruction in note-taking & organising writing • Step 3 Text-matching to check for originality • Step 4 Concordancer (focussing on general usage and discipline- specific language) • Phrases commonly used in general academic English • Discipline-specific language • Unacceptable recycling

  36. Researcher Education & Development Adelaide Graduate Centre Refining authorial voice: Liang • For students who have a good grasp of both citation conventions and reasonably high-level English expression. • Step 1 Text-matching to check for originality • Step 2 Concordancer (focussing on general usage and discipline-specific language) • Step 3 General Google Scholar search - Matches with unrelated student writing • Discipline-specific language • Springer Exemplar

  37. Examples of intertexuality

  38. Corpus output

  39. Concrete Outcomes • Increased participants’ understanding of acceptable and unacceptable intertextuality • Enhanced participants’ knowledge of disciplinary language • Developed participants’ autonomy through stages of a reflective process, enabling them to “critically review [their] suppositions of subject discipline and existing knowledge” (Chan et al. 2002:515)

  40. Questions?

  41. Your thoughts • How much instruction is required from ALL staff? Will this process promote autonomy? • Difficulties in implementation in your own situation? • Other uses for the software?

More Related