1 / 18

An introduction to Corpus tools 2 Princess Nourah University Dr B. K. Mustafa

An introduction to Corpus tools 2 Princess Nourah University Dr B. K. Mustafa. Session outline. Practical introduction to Antconc Clusters N Grams Word list Key word list. An introduction to Antconc Clusters.

dannyc
Télécharger la présentation

An introduction to Corpus tools 2 Princess Nourah University Dr B. K. Mustafa

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An introduction to Corpus tools 2 Princess Nourah University Dr B. K. Mustafa

  2. Session outline Practical introduction to Antconc • Clusters • N Grams • Word list • Key word list

  3. An introduction to AntconcClusters 1) The clusters function allows the user to identify groups of words that commonly are found with other words. This can be useful in identifying phrases. 2) Enter the term here as usual 3) The size of the cluster can be set here

  4. An introduction to AntconcClusters 1) Increasing the cluster size to 6 reveals a few phrases

  5. An introduction to AntconcN-grams 1) The N-gram function identifies clusters of words 2) The size of the cluster can be set here

  6. An introduction to AntconcCollocates 1) The collocate function allows the user to identify words which are found with other specific words 2) Enter the query here 3) Set the span here

  7. An introduction to AntconcWord lists 1) The word lists function allows the user to identify the most common words in their corpus 2) To generate a word list just click on start (the box can be left empty)

  8. An introduction to AntconcWord lists 1) To generate a more corpus specific word list click on the tool preferences tab 2) Select word list. And a window will appear giving the user multiple options. 3) This function can be used to remove general words which may interfere with findings. 4) The user can either add terms to the box below or can add a .txt file containing a stop list of terms

  9. An introduction to AntconcWord lists 1) To add a stop list click the tab here 2) Then click on open, a window will appear where the user can search for a file.

  10. An introduction to AntconcWord lists By using the stop list irrelevant common terms are removed/ignored from the words list.

  11. An introduction to AntconcKeyword list 1) A keyword list is similar to a word list but uses another corpus of common terms to give a more accurate list of terms

  12. An introduction to AntconcKeyword list 1) To activate the keyword list, click on the tool preference tab . 2) Click on keyword list to reveal a number of options 3) Select ‘use raw file(s) 4) Click on choose files

  13. An introduction to AntconcKeyword list A window will appear where the user can apply a relevant .txt file. Here I am using the Brown corpus to leverage against my specific corpus. Once the desired .txt file has been found click on open

  14. An introduction to AntconcKeyword list The keyword results will now be different and should only list unique specific terms

  15. An introduction to AntconcGoing multilingual 1) With antconc you can form a corpus with many languages including Arabic, Chinese and Japanese 2) First click on the Global settings tab 4) Then choose the correct encoding for your language Note: you may have to experiment with the various encodings but UTF8 seems to work with most languages. 3) Then select ‘language encodings’

  16. In class exercise • 1) Form a corpus or around 40,000 tokens in relation to the treatment of Multiple Sclerosis (M.S.). • 2) Using the N-grams function identify clusters formed of 2,3,4, and 5 words

  17. Assignment 2 Assignment 1 overview: Add .txt files to Antconc and then undertake a cluster search using the N-grams function. Deadline: 7th of October Submission method: email to instructor Instructions: Add four different .txt files with content relating to the same subject (e.g. religion, medicine, speeches by same author etc.) and then conduct a cluster search using the N-gram function. Set the minimum cluster size to 3 and the maximum to 5. Once complete reduce the size of Antconc and take a screen shot of your results, with your name and student ID/email in the background (using word or wordpad to add your name). See the next slide as an example of the screen shot of what should be emailed to your instructor. The email subject title should be: CAT AS2

  18. Assignment 2 Name and email is visible in the background 4 .txt files added N-grams function activated Min set to 3 and Max to 5

More Related