1 / 25

Introduction to Humanities Computing

Introduction to Humanities Computing. Spring 1999. Lecture Six. Passport to Tour. What is the important Concept from Computer Confluence Chapter 6 Devoted to spread sheets Which is transferable to The world of text analysis ?. Malleable Matrix. A tour.

muniya
Télécharger la présentation

Introduction to Humanities Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Humanities Computing Spring 1999 Lecture Six

  2. Passport to Tour What is the important Concept from Computer Confluence Chapter 6 Devoted to spread sheets Which is transferable to The world of text analysis ? Malleable Matrix

  3. A tour Dartmouth Dante Database Project DDP is still best accessed via Telnet. The address remains: library.Dartmouth.EDU at the prompt type connect dante

  4. What is an electronic text? Can you provide examples? What type of electronic text will survive?

  5. What is a electronic text? • Any string of characters • Any file or document that can be read • A word processing file • A text file

  6. Types of Electronic Texts • Literary text • Linguistic corpus • Hypermedia work

  7. A variety of forms • WWW site (Hypermedia) • Myst, Macbeth (Software, Text, and Media) • MS Word formatted file (Wordprocessing) • ASCII Text file (aka “Flat File”)

  8. ASCII • American Standard Code for Information Interchange 0 Null 3 EOT (end of text) 13 CR (carriage return) 32 SP (space) 48 0 49 1 65 A 97 a, 98 b, 99 c ...

  9. Coding Standards • ASCII = 7 bits per character • 128 characters - 7 bits per character • 32 Reserved for printing info • Rest for printing characters • Extended ASCII = 8 bits or 1 byte, • 256 characters, upper ASCII characters used for special characters, characters with diacritical marks & ligatures • UNICODE = 16 bit character set • 65,000 characters - most known languages

  10. Why? • Cross-Platform • Long-term survival of data • You can use it to encode more complex documents using markup (SGML) ASCII Text + Markup = Electronic Representation of Literary Text

  11. Encoding <html> <Head><Title>Welcome</Title></Head> <Body><H1>Welcome to 3F03</h1> This is the home page for 3F03<P> <B>Quantitative Methods in the Humanities </B> Fran&ccedil;ais </Body></html> In HTML all formatting provided by codes using ASCII characters

  12. Content Model Text Head Body Title Heading Paragraph

  13. Limits of HTML • No codes for many of the features: Character, Author, Text type, Sonnet, Lines • Text analysis software can’t handle it • Languages other than English

  14. COCOA Markup • Continuous Tags • Do note require closing </tag> - change value • Format: • <variablevalue> • (angled brackets < > are delimiters) • Example • <speaker Romeo> • <scene 1> • <L 1> • <text-type frontmatter> • <<Comments not meant to be indexed>>

  15. COCOA example <Title Misunderstanding> <<Example for Demonstration, 1997>> <t titlepage>THE MISUNDERSTANDING A PLAY IN THREE ACTS <t dedication>To my friends of the THEATRE DE L'EQUIPE <t characters>CHARACTERS IN THE PLAY: THE OLD MANSERVANT ... MARIA <t information>LE MALENTENDU (THE MISUNDERSTANDING) was presented for the first time at the Theatre des Mathurins, Paris, in 1944

  16. Example continued <act 1> <t stagedir>Noon. The clean, brightly lit public room of an inn. Everything is very spick and span. <t play> <p mother>He'll come back. <p martha>Did he tell you so?

  17. Brief HistoryText Analysis Tools • Text-analysis tools grew out of concordances: • 1247, Concordance to the Vulgate Bible, Paris • 1949, Father Busa Index Thomisticus • 1970s, Batch Concordancers like OCP • 1989, TACT - Interactive Concordancers • 1990s, Textual Visualization

  18. What can be done... • Text-analysis tools provide • Speed • Complex Searches • Reconfigured Views • Statistics Researchers can generate custom concordances interactively

  19. Concordances and Interpretation • Concordances provide an alternative arrangement of the text that brings passages together into a concordantia. • Interpretative strategy where answers are drawn from the text by assembling passages on the subject in question and reading this rearranged text as a meaningful whole. • Concordance facilitates this rearrangement providing alternative views.

  20. Types of Text-Analysis • Stylistic • Describing author’s style and comparing it • Authorship studies • Linguistic • Create representative corpus • Describe linguistic use (diachronic or synchronic) • Thematic • Finding patterns (words & phrases) in a text • Following themes through a work • Comparing themes • Demands a reiterative reading

  21. Problematic equations That a theme is the passages where a set of words appear • Can themes be identified by key words? • What about ambiguous words? That concording passages into a new text is an acceptable interpretative strategy • Where does the passage start and end around a word? • Is reading a rearranged text appropriate? That the distribution of words indicates the progress of a theme • Do the number of hits indicate intensity of theme?

  22. What’s the connection Surface Measurement (Quantification) Interpretation (Understanding)

  23. Text-analysis is about proving things about texts Stylistic analysis provides reproducible descriptions of authors style Measurement of surface features allows us to prove more interesting points Reaction to impressionistic reader oriented literary theory Text-analysis is the rereading a text in ways that help one better understand it Text-analysis is only one of many strategies Text-analysis reveals anomalies to be researched Text-analysis is useful precisely because the computer can’t do well what human readers do well, and can do other things well Two Views

  24. E-Text Research Project Planning Phase Planning Traditional Research Implementation Phase Prototyping Scanning or Buying Markup Publication Proofing Interactive Study Research Phase

  25. Obtaining an E-text • Acquire one from someone else. • Oxford Text Archive • Search the Internet using WWW • Commercial Vendors • Create it yourself • Scan it using OCR software • OCR = Optical Character Recognition • Type it in or hire services for input • Markup • Validate

More Related