1 / 6

Preparing a Corpus for the Sketch Engine: Methodologies and Configurations

This document outlines the process of preparing a corpus for the Sketch Engine (SkE). It covers various formats, including plain text files with one word per line, as well as advanced structures with lemmas and POS (part-of-speech) tags. Additionally, it describes how to structure XML markup for corpus documentation, ensuring proper organization and retrieval of data. The configuration file is also explained, detailing how to set attributes such as word, tag, and lemma, and how to configure structures for effective corpus management.

myles-scott
Télécharger la présentation

Preparing a Corpus for the Sketch Engine: Methodologies and Configurations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kilgarriff: Preparing a corpus for SkE Preparing a corpus for the Sketch Engine

  2. Kilgarriff: Preparing a corpus for SkE Vertical format • One word per line in a plain text file Suddenly , their luck changed .

  3. Kilgarriff: Preparing a corpus for SkE With lemmas and POS-tags Suddenly suddenly RR , - PUN their their PRP luck luck NN1 changed change VVD . - PUN

  4. Kilgarriff: Preparing a corpus for SkE With XML structure markup <doc id=“ABC” region=“UK” genre=“fiction”> <s> Suddenly suddenly RR <g/> , - PUN their their PRP luck luck NN1 changed change VVD <g/> . - PUN <s>

  5. Kilgarriff: Preparing a corpus for SkE Corpus configuration file • Tells the system • Where data and other files are • What attributes • word, tag, lemma and structures • <doc> <p> <s> <g/> it contains • How to display

  6. Kilgarriff: Preparing a corpus for SkE Simple example PATH /corpora/test2 ATTRIBUTE word ATTRIBUTE lemma ATTRIBUTE tag STRUCTURE doc { ATTRIBUTE region ATTRIBUTE genre } STRUCTURE s

More Related