240 likes | 327 Vues
There are 10 types of people. Those who understand binary and those who don ’ t. We got the Vocab. Hardware Software Server Storage types Download & Upload (FTP, SSH) The Internet (Browsers, ISP, LAN, POP, NAP, routers, backbones)
E N D
There are 10 types of people. Those who understand binary and those who don’t.
We got the Vocab. • Hardware • Software • Server • Storage types • Download & Upload (FTP, SSH) • The Internet (Browsers, ISP, LAN, POP, NAP, routers, backbones) • Languages/Code (binary, unicode, hex, ASCII, SGML, HTML, XHTML, DHTML, CSS, PHP, Java, Flash, XML, XSL) • File formats (rtf, GIF, TIFF, JPEG, MP3, MP4, WAV, dtd) • Random TLAs (NEH, ODH, ALA, ISO, W3C, TEI, GIS) • “Deep Freeze” (Security Software; Terminals on campus erase all files/programs on reboot that are not part of the original image)
Hardware--external physical equipment • The physical computer (tower, monitor, laptop) • Printer • External storage device
Early Computers • The first computers filled rooms, such as ENIAC pictured here. ENIAC was developed by the U.S. military in the early 1950s.
Software--inside the computer • Operating System • The infrastructure that manages and coordinates files/programs • MS-DOS, Windows, OS X • Programs/applications • MS Office (including Word), Adobe Suite (Photoshop, Dreamweaver) • Open Source v. Proprietary • “Off the Shelf” v. “In-House”
The mythic “Server” • Often refers to hardware in a specified location (frequently part of a server farm or cluster) that is used for web site delivery, data storage, and/or delivery of multimedia files (streaming video server). • Has its own operating system (such as Unix) and software (such as Apache) to “serve” customers. • Files must be uploaded from a PC to the server using FTP, SSH or “Fetch” to appear on the Internet. • Generally, UND’s servers are maintained by ITSS.
Early Storage Mediums Punch card Magnetic Tape
Floppy discs 8 inch (above) 3.5 inch (below) 5 inch (above) 8 inch/3.5 (below)
How data is stored. • Punch cards & magnetic tape stored information sequentially (linear) • Floppies (and later CDs, Jump Drives) have random access (can jump to the file you want in any order) • RAM (random access memory) bits (a binary digit; base 2 comprised of zeros and ones) of information are stored in a memory cell with a specific address on the hard drive. 8 bits=1 byte • ASCII (American Standard Code for Information Interchange), Unicode, hexidecimal (hex) are ways of translating our alphabets into binary computer language to be stored. • Databases (groups of related data with fields, such as MS Access or MySQL) NB Excel is NOT a database app; it is a spreadsheet.
Where data is stored. • External storage (jump drive, CD, DVD, external hard drive)--something that connected to and able to be removed from the PC. Now usually connected through USB port. • On your local hard drive (inside your personal computer and it’s only available on your personal computer) • “On the server” files either saved or uploaded to a designated server space.
How the Internet works…basically. • Enter URI (Universal Resource Identifiers) aka URL (Universal Resource Locator) into your Browser • Browser (IE, Netscape, Mozilla, Firefox, Safari, Sea Monkey) on a local computer displays/reads internet files. • Browser/local computer is connected via modem, airport, ethernet card, or cable to a local area network (LAN) or Internet service provider (ISP). Can also access through a VPN (Virtual Private Network, which is set up for a security measure). • LAN or ISP is connected to a larger network known as a Network Access Point (NAP) • Through routers and “backbones” (fiber optic cables) information is delivered/routed to its destination
Languages & Codes • Standard Generalized Markup Language (SGML) (1986), established by International Organization for Standardization (ISO) (est. 1946) • Hypertext Markup Language (HTML) (1991); basic structure language (static) • Cascading Style Sheets (CSS) presentation definition language (how a page should look) • JavaScript (Java) client side object scripting language that is often used to run small applications (roll overs, pop ups, site counts, date, etc.) • Flash--Adobe software that allows for multimedia/interactive displays • DHTML (dynamic HTML); animated or interactive using CSS, Java, or Flash (among others) • Hypertext Preprocessor (PHP); server side scripting language used to create dynamic web pages (translates and performs functions faster and outputs to HTML)
Languages & Codes, part Deux • Extensible HTML (XHTML); well formed and valid HTML (lower case, open and close tags); current standard through W3C (World Wide Web Consortium) • Extensible Markup Language (XML) derived from SGML • Extensible Style sheet Language (XSL sometimes XSLT) (used to transform XML files into something else, often used to export XML to XHTML) • Document Type Declaration (.dtd file) file that contains the rules and syntax for XML.
File Formats • .rtf (Rich Text Format) cross platform file that enables .doc files to be read by programs not created by Microsoft • .pdf (Adobe) Portable Document Format for document exchange (usually not interactive) • GIF (Graphics Interchange Format) bit map image file format • JPEG (Joint Photographic Experts Group) Compressed image file format for Web delivery • TIFF (Tagged Image File Format) high resolution image file (archival standard) • .dpi (Dots per inch) relates to the resolution/digital image quality (600 dpi is currently the archival standard) • MP3 digital audio file format (compressed) • MP4 digital audio/visual file format (used for streaming video web delivery) • WAV (WAVE) uncompressed waveform audio format (generally thought of as archival standard for A/V files)
TLAs (Three letter acronyms) • NEH (National Endowment for the Humanities) • ODH (Office of Digital Humanities, sub-division of NEH) • ALA (American Library Association) • ISO (International Organization for Standardization) • W3C (World Wide Web Consortium) • TEI (Text Encoding Initiative) • GIS (Geographic/Geospatial Information System)
TAKE FIVE Dr. William Caraher to speak next.
First Digital Literature Project Unlike many other interdisciplinary experiments, humanities computing has a very well-known beginning. In 1949, an Italian Jesuit priest, Father Roberto Busa, began what even to this day is a monumental task: to make an index verborum of all the words in the works of St Thomas Aquinas and related authors, totaling some 11 million words of medieval Latin. Father Busa imagined that a machine might be able to help him, and, having heard of computers, went to visit Thomas J. Watson at IBM in the United States in search of support (Busa 1980). Some assistance was forthcoming and Busa began his work. The entire texts were gradually transferred to punched cards and a concordance program written for the project. The intention was to produce printed volumes, of which the first was published in 1974 (Busa 1974).A purely mechanical concordance program, where words are alphabetized according to their graphic forms (sequences of letters), could have produced a result in much less time, but Busa would not be satisfied with this. He wanted to produce a "lemmatized" concordance where words are listed under their dictionary headings, not under their simple forms. His team attempted to write some computer software to deal with this and, eventually, the lemmatization of all 11 million words was completed in a semiautomatic way with human beings dealing with word forms that the program could not handle. Busa set very high standards for his work. His volumes are elegantly typeset and he would not compromise on any levels of scholarship in order to get the work done faster. http://www.digitalhumanities.org/companion/
Historical conference on Dig Lit. In 1964, IBM organized a conference at Yorktown Heights. The subsequent publication, Literary Data Processing Conference Proceedings, edited by Jess Bessinger and Stephen Parrish (1965), almost reads like something from twenty or so years later, except for the reliance on punched cards for input. Papers discuss complex questions in encoding manuscript material and also in automated sorting for concordances where both variant spellings and the lack of lemmatization are noted as serious impediments. Became the first of a regular series of conferences on literary and linguistic computing and the precursor of what became the Association for Literary and Linguistic Computing/Association for Computers and the Humanities (ALLC/ACH) http://www.digitalhumanities.org/companion/
First Projects--literature & linguistics • Frequently focused on concordances • quantitative approaches to style and authorship studies • Computers and the Humanities began publication in 1966 under the editorship of Joseph Raben. • Centre for Literary and Linguistic Computing in Cambridge in established in 1963 • Oxford Text Archive (OTA) est. 1976 • Medieval and Classical texts were among the first collaborative digital collections. • Unicode invented in the 1980s to content with character representation issues
Exemplary Digital History and Literature Collections • Journals of Lewis and Clark (U of Nebraska, Lincoln) • Eyes on the Prize (WashU) • The Revised Dred Scott Collection (WashU) • Walt Whitman Archive (U of Nebraska, Lincoln) • University of Virginia E-text Center
Nuremberg… • Harvard Collection (images, full text, searchable, uses in-house database, IMT cases Medical, Milch and Pohl) • Yale Collection (Avalon Project) only the text (HTML) from the Blue Series (IMT) and Red Series (vols. 1-4 Nazi Aggression). No images, not searchable. • U of MO, KC excerpts from various trials, in HTML without page images and not searchable.
How our project is different. • A case/theme that hasn’t been done anywhere else. • Full text in P5 XML (which will be fully searchable and built according to international standards) • Page images • Critical apparatus/maps to enhance the information, particularly to be used by instructors.
Stage 1--Transcription • Transcribe the text of the transcripts • NOT a facsimile, only need to maintain paragraph breaks/page breaks • Maintain spelling (if it is “wrong” add [sic] after the word) • DO NOT maintain hyphenated word breaks • Save file as .rtf (to avoid MS Word proprietary codes) • Images of the transcripts to be transcribed are available online OR you can go to Special Collections to view the original/request photocopies