1 / 18

Part Two: Using Xaira to explore corpora

Part Two: Using Xaira to explore corpora. Richard Xiao z.xiao@lancaster.ac.uk. Outline of the talk. Concordance Wordlist Keywords (No) Output formats Manipulating results Collocation/colligation Distribution analysis Live demonstration Tips for keeping away from bugs

cale
Télécharger la présentation

Part Two: Using Xaira to explore corpora

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part Two:Using Xaira to explore corpora Richard Xiao z.xiao@lancaster.ac.uk

  2. Outline of the talk • Concordance • Wordlist • Keywords (No) • Output formats • Manipulating results • Collocation/colligation • Distribution analysis • Live demonstration • Tips for keeping away from bugs • Multilingual dimension • Xaira FAQs

  3. Concordance • Word query ( ) • Search for a word • Phrase/Quick query ( or ) • Searching for a word or phrase • Addkey query ( ) • POS or lemma search • Pattern query ( ) • Regular Expression search • XML query ( ) • Search for XML markup • CQL/XQL query ( ) • Searching using XML-based Corpus Query Language • Query builder ( ) • A powerful combination of all query types

  4. Wordlist • In Client >> Word query (up to 100,000 lexicon entries): sorting alphabetically, by frequency, or the number of forms • In Xaira Indexer Tools >> Tools >> Indexer >> Options >> Create frequency table

  5. Keyword? • Sadly, no – • Use WordSmith instead • WordSmith version 4.0 fully supports Unicode

  6. Output formats • Page mode vs. Line mode (KWIC) • Plain text vs. XML text • Scope of context • Alignment (left, right, top, bottom) • Reference (on the status bar)

  7. Manipulating results • Edit query (to save time for related queries) • Bibliographical data • Sort KWIC concordances • Select/block select/copy concordances • Right click on a concordance • Thin/edit concordances • Random sampling • Save queries and export them in XML • Print results

  8. Collocation/colligation ( ) • Statistical measure (MI or Z) • Window span • Minimum frequency • Minimum MI/Z score • Top N collocates • Computing collocation statistics for individual words • Applying selected lemmata • Colligation (Addkey tags)

  9. Distribution analysis ( ) • Defining partition (subcorpora) • (Texts >> Column control to select XML tags) • Texts >> Define partition (3 ways) • Based on selected class, values in a column, or solutions to a query • Texts >> Open partition • Tabulation (text class, words, hits, %, etc) • Normalised frequencies for subcorpora • Sorting tabulated data • Graphic presentation (pie/bar chart) • Save distribution data in various forms • Copy pie/bar chart

  10. Additional features of Xaira • Annotating concordances (making notes) • Copying query text or notes • User-defined stylesheet • Colour book (e.g. different colours for different POS categories) • Remote access over a network • Platform-independent

  11. Xaira live demonstration • Here we go… • …slides to follow

  12. Tips for keeping away from bugs • In the Line mode, a maximum of 1,524 concordances are displayed • See the rest in the Page mode • In Query builder, joining query nodes in the horizontal direction (‘OR’) and then in the vertical direction (‘AND’) may produce unreliable counts when the Link type is specified as ‘One-way’ or ‘Two-way’ • Only define Link type as ‘Next’ or ‘Not next’ • If thousands of hits are downloaded and dozens of them are deleted by reverse selection in thinning, the system may crash • If concordances have been sorted/edited, a saved query may not be opened again • Save the edited concordances as an XML list using ‘Query – Listing’ in the menu or pressing on the toolbar

  13. Truly multilingual - Chinese

  14. Truly multilingual - Bengali

  15. Truly multilingual - Hindi

  16. Truly multilingual - Punjabi

  17. Truly multilingual - Urdu

  18. Xaira FAQs • Is Xaira free and where can I get it? • Yes, it is absolutely free. You can get a copy (binary for Windows, and source codes for compilation on the Unix/Linux/Mac system) at the SourceForce website. The latest release is 116. http://sourceforge.net/project/showfiles.php?group_id=130289 • Where can I get more documentation? • In addition to the built-in help file, more documentation is available at the Xaira site: http://www.oucs.ox.ac.uk/rts/xaira/ • Where can I get technical help? • You can sign up for the Xaira Preview List to get help: http://www.tei-c.org.uk/tei-bin/betatest • For a critical review, see http://www.lancs.ac.uk/postgrad/xiaoz/papers/xaira_review.pdf

More Related