1 / 19

Introduction to Xaira Part One: All about Xaira

Introduction to Xaira Part One: All about Xaira. Andrew Hardie. What is Xaira?. X ML A ware I ndexing and R etrieval A rchitecture The XML-aware version of SARA for the BNC corpus Several programs, including the Index Toolkit and the Client. How do you pronounce “Xaira”?.

nora-odom
Télécharger la présentation

Introduction to Xaira Part One: All about Xaira

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to XairaPart One: All about Xaira Andrew Hardie

  2. What is Xaira? • XML Aware Indexing and Retrieval Architecture • The XML-aware version of SARA for the BNC corpus • Several programs, including the Index Toolkit and the Client

  3. How do you pronounce “Xaira”? • Its designers pronounce it like “Sarah” • We pronounce it like “Zirah” • Other pronunciations may vary

  4. Why are we talking about it? • Andrew and Richard have been beta-testers for Xaira for several years • Andrew wrote the help file

  5. What sort of program is Xaira? • Xaira is an analysis program for indexed corpora • Searching indexed vs. non-indexed corpora • Indexing – retrieval • Xaira does both

  6. Indexing

  7. Retrieval

  8. Xaira contains • The Indexer itself • Xaira-tools • “Easy” user interface for corpus set-up and using the indexer • The Xaira “client” • Sophisticated corpus analysis system • Wordlist, concordance, collocation • Structured searching

  9. Client, server? • Why does Xaira describe itself as a client? • Xaira splits the work between… • one program that you use to build the search (the client), and • one program that actually looks in the index and finds the solutions (the server) • But you can just use the client like any concordancer software • the user never deals directly with the server

  10. What is special about Xaira? • Xaira is based on XML • XML is based on Unicode • Thus Xaira can be used with any language in any alphabet • But Xaira has been specially designed to aid multilingual analysis • e.g. allows Unicode keyboard setup for any language

  11. Do I need a Unicode corpus? • Yes! • (… but ASCII counts as valid UTF-8) • Both UTF-8 and UTF-16 are OK • (If in doubt, ask Andrew about variant text encodings)

  12. Does my corpus need to be XML? • No! • Xaira can add basic XML to a corpus of plain-text files • Xaira can also upgrade SGML to XML • TEI XML is perfect for Xaira… • … warning: Xaira will reject ill-formed XML or SGML files.

  13. First, index your corpus Access the commands you need to set up and run the indexer from the Tools menu Messages from the different tools appear here (you don’t need to worry about them)

  14. The Tools Menu Tools for preparing your corpus and its header Tools for telling Xaira how to handle the XML markup in your corpus The indexer itself

  15. Scared? • Using Xaira-tools to prepare a corpus manually can be a bit complex • Instructions: http://www.oucs.ox.ac.uk/rts/xaira/Doc/ • But don’t despair – there is a wizard! • File >> Index Wizard

  16. The index wizard

  17. The index wizard

  18. The index wizard

  19. Live Indexing!

More Related