1 / 15

Digital Italian

Digital Italian. An overview of Italian corpora. A linguistic corpus:. a body of texts / transcripts collected for linguistic purposes, computerized, representative for the variety studied, balanced, annotated. Linguistic annotation can be useful or restrictive.

Télécharger la présentation

Digital Italian

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Digital Italian An overview of Italian corpora

  2. A linguistic corpus: • a body of texts / transcripts collected for linguistic purposes, • computerized, • representative for the variety studied, • balanced, • annotated.

  3. Linguistic annotation can be useful or restrictive Extra-linguistic annotation useful for sociolinguistic research Annotation

  4. General Written Diachronic Specialized Spoken Synchronic Italian corpora

  5. Corpus e lessico di frequenza dell’italiano scritto (COLFIS) Corpus di riferimento dell’italiano scritto / Corpus dinamico dell’italiano scritto (CORIS/CODIS) General corpora Written Italian

  6. COLFIS (over three and a half million words) Newspapers Periodicals Books Il Corriere della Sera La Repubblica La Stampa Other, arts, science and technology, cars and boats, children and youngsters, home and hobby, women’s magazines, photo love story, general information, society, radio and television, sport, travels and ecology. Other, arts, children, SF, detective and spy stories, hobby and travel, classics, modern narrative, romance, essays, natural and exact sciences, human and social sciences, theatre and poetry. Economy, news of local interest, society, crime news, internal / external affairs, science, show biz and sports. COLFIS - structure

  7. CORIS / CODIS (one hundred million words) Press Fiction Academic Prose Legal and Administrative Prose Miscella-nea Epheme-ra Newspaper, periodical, supplement Novels, short stories Human sciences, natural sciences, physics, experimental sciences Legal, bureaucratic, administrative Books on religion, travel, cookery, hobbies, etc. Letters, leaflets, instruction National, local/ specialist, non-specialist /connotated, non-connotated Italian, foreign, for adults, for children, crime, adventure, SF, women literature Books, reviews, scientific, popular history, philosophy, arts, literary criticism, law,economy, biology, etc. Books, reviews Books, reviews Private, public/ Printed form, electronic form CORIS/CODIS – structure

  8. Lessico di frequenza dell’italiano parlato (LIP) -> Bancadati dell’italiano parlato (BADIP). Archivio delle varietà dell’italiano parlato (AVIP). LABLITA General corpora Spoken Italian

  9. CLIPS (the spoken corpus) Radio and television speech Field recordings Readings Telephone speech Entertainment, informative transmissions, cultural and educational transmissions, commercials. Map task dialogues and spot the difference game. Readings by the speakers themselves or by professional dubbing actors. Conversations between a fake tour-operator and three hundred people. Spoken and written Italian:Corpora e lessici dell’italiano parlato e scritto (CLIPS)

  10. Corpus di italiano televisivo (CIT) La Repubblica Specialized corpora

  11. CIT Current affairs Entertainment (games, talk-show, varieties) Commer-cials Sports news Newscast Com-menta-ries. Play-by-play Studio broadcast. On-field broadcast. Text Text. Slogans. Studio broad-cast On-field broad-cast Text Headlines. Studio broadcast. On-field broadcast CIT – structure

  12. Corpus di italiano televisivo

  13. La Repubblica Year 1985 - 2000 Genre News Comment Topic Religion Culture Economics Education News Politics Science Society Sport Weather Unclassified La Repubblica – structure

  14. La Repubblica

  15. Thank you! Anne-Marie OBRETIN Mres in European Languages and Cultures University of Exeter ao231@exeter.ac.uk

More Related