1 / 106

Hypertext (1)

Hypertext (1). Historically, text is sequential: read from beginning to end Hypertext is non-sequential, with internal links from one part to another Hypertext, the word, coined by Ted Nelson in 1966. First hypertext system, Xanadu, named for Coleridge’s magical world. Hypertext (2).

whitley
Télécharger la présentation

Hypertext (1)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hypertext (1) • Historically, text is sequential: read from beginning to end • Hypertext is non-sequential, with internal links from one part to another • Hypertext, the word, coined by Ted Nelson in 1966. • First hypertext system, Xanadu, named for Coleridge’s magical world.

  2. Hypertext (2) Links in hypertext give access to: • topics or information directly related to the current idea • notes, such as footnotes or endnotes • explanations of special words or phrases • biographical information about people behind the current idea

  3. Claims about Hypertext • Represents large body of information organized into numerous fragments • Fragments relate to one another • User needs only a small fraction of the fragments at any time • Exists only in cooperation with the reader • Is a legitimate literary concept

  4. Claims about Hypertext (2) • Integrates three technologies • Publishing (as a book publisher would) • Computing (as the infrastructure) • Broadcasting (over a computer network) • Depends on computer environment for high-speed transitions between nodes • Modelled by network ADT

  5. Using Hypertext • Browser, or hypertext engine: a computer-based system that allows links to be followed easily • Navigation aids: parts of the user interface that provide a sense of location and direction • Notation: a convenient way of specifying links as a hypertext author

  6. WWW as a Hypertext System • Browser: Netscape, for example • Navigational aids: • Forward, back, home • History list • Colored anchors • Consistent titles • Notation: HTML

  7. Network ADT • Model of hypertext • Similar to tree ADT, but allows cycles • Links have an explicit direction, capturing the idea of going forward and going back

  8. Network ADT (2) • Definition:A network is a collection of nodes and links between pairs of nodes such that • Each link has a direction. • Each node is reachable from any other node. However, the path is not necessarily unique. • No node is linked to itself. • There are no duplicate links in the same direction.

  9. Network ADT (3) • Observations: • There is no hierarchy; all nodes are considered the same. (In a tree, the root is special.) • Links have direction, but reverse travel is possible. (One can go backwards on a link, or forwards on a link that goes in the opposite direction.) • Cycles are allowed.

  10. Directed Graphs • Both networks and rooted trees are examples of a connected directed graph, sometimes called a digraph. • Formally, a digraph is a set of nodes and a set of links joining ordered pairs of nodes. The link (A,B) that joins A to B is different from the link (B,A) that joins B to A

  11. Navigation in Sequential Text • Low level: • Punctuation • Fonts • Separation into sentences and paragraphs • High level: • Chapters, sections, subsections • Table of contents • Index

  12. Navigation in Sequential Text (2) • Page layout • Page numbers • Running heads • Displayed text

  13. Navigating in Hypertext • Issues: • Where am I? Have I been here before? When? • How did I get here? • Where can I go? • Anchors (or links) • Implicit anchors (or links): clipboard, glossary, calculator • Computed links: next train • Back • Forward • Home

  14. Navigating in Hypertext (2) • Within a node: • Save to disk • Print • Annotate • Scroll • Zoom

  15. Navigating in Hypertext (3) • User interface support • Give power to the users through • short response time • low cognitive load • path clues, perhaps decaying over time • Follow a path forward or backward • Return to a node

  16. Text Markup • Unified view of text and hypertext presentation • Foundation of all word processors • Describes all electronic manuscripts by • separating logical elements • specifying processing functions for these elements

  17. Text Markup (2) • Originated by William Tunnicliffe (Sept. 1967), in talk advocating separating information content of document from format • Control formatting with embedded codes

  18. Generalized Markup • Goal: allow editing, formatting, and retrieval systems to share documents • Devised by Goldfarb, Mosher, Lorie at IBM, 1969 • Formally defined • document types • explicit nested element structure • generic identifier associated with each element

  19. SGML • Standard Generalized Markup Language • First draft standard, 1980 • ISO 8879, 1986 • Based on the ADT tree • Allows the description of a document, considered as a tree, to be embedded in the file containing the document

  20. Functions of SGML • Tags documents in a formal language • Describes internal logical structures • Links files with an addressing scheme • Acts as a database language for text • Accommodates multimedia and hypertext • Provides a grammar for style sheets • Allows coded text reuse in surprising ways

  21. Functions of SGML (2) • Represents documents independent of computing platform • Provides a standard for transfering documents among platforms and applications • Acts as a metalanguage for document types • Represents hierarchies • Extends to accommodate new document types

  22. Generic Identifiers • Tagging vs. formatting • Tagging shows document structure • Formatting describes document display • Example: A paragraph is a sequence of closely connected sentences and can be delimited by a tag. A paragraph can be displayed with either • initial indenting or not • extra separation or not

  23. Generic Identifiers (2) • Syntax • Beginning: < identifier > • End: </ identifier > • Attribute list, with assigned values, may follow identifier

  24. Generic Identifiers (3) • Typical identifiers: • p paragraph • q quotation • ol numbered (ordered) list • ul unnumbered list • li list item • b bold face • i italics

  25. Display of Text • ASCII codes for printing characters carry no information about display • Printed or displayed characters are described by their font.

  26. Fonts • Fonts come in families, which are a group of fonts with similar design characteristics. • A font is a set of displayed characters in a particular design. To describe a font, we specify: • The font face, or type face, which is the design of the font. • The size, measured in points, which is the height of representative characters. • The appearance: bold, italic, underline, outline, shadow, small cap, redline, strikeout, etc.

  27. Fonts (2) • Font families include standard modifications of a base font, such as italics and bold, to change the appearance. (This family is Times New Roman.) • Some families are sans serif, without the cross strokes accentuating the ends of the main strokes.

  28. Fonts (3) • Typical examples of fonts are • Times New Roman • Arial • Century Schoolbook • Lucinda Calligraphy • Verdana

  29. Fonts (4) • The size of this font is 32 points • This is 54 points • This is 24 points • There are exactly 72.27 points per inch

  30. Fonts (5) To render a character in a font, one must • Know the computer code (ASCII) of the character • The font name and properties Then the computer creates the glyph that represents the character in the specified font.

  31. Fonts (6) In the process, the computer uses the • Baseline: the invisible line on which characters are aligned. • x-height: the actual height of the character x • Kerning: spacing between two letters. Note that in printing “wo” the “o” slides under the “w” to form and locate the glyph

  32. Input devices for text • Keyboard • Scanning with optical character recognition • Hand printed • Hand written (cursive) • Machine printed • Voice recognition • Pen-based

  33. Input errors • Human-based, e.g. • Typographic • Poor writing • Machine dependent • Small typeface differences: O vs. D • Limits of technology • Pre-existing errors

  34. Automatic error correction • Error rate for keyboard input = 98% OCR accuracy + automatic correction • Automatic correction also helpful in: • Computer-aided authoring • Communication enhancement for disabled • Natural language responses • Database interaction • Example: MS Word AutoCorrect

  35. Automatic spelling correction • Three increasingly difficult tasks: • Non-word detection: string in text not in dictionary • Isolated word correction: thier automatically becomes their • Context-dependent correction: here automatically becomes hear

  36. MS Word AutoCorrect

  37. General spelling correction • Can allow human intervention, e.g. choose the correct spelling from a list of candidates • No context dependent general purpose correction tool exists yet.

  38. Issues for spelling correction • Type of input device • Focus on adjacent keys: b vs. n • Focus on similar shapes: O vs. D • Interactive vs. automatic correction • How many choices are reasonable? (One for automatic correction.) • How accurate should guesses be? • Proper choice of dictionary

  39. Proper Dictionary

  40. Word list choice • Use lexicon--a word list appropriate to a particular topic • As opposed to dictionary -- a comprehensive list of words • Include provision for adding new words

  41. Word list choice: Example 1 • Compare NY Times news wire text with Webster’s 7th Collegiate Dictionary • 8 million words in news wire text: • only 36% in dictionary • only 39% of dictionary words used in text

  42. Example 1 (continued) • Of text words not in dictionary • 1/4 inflected forms (change in case, gender, tense) • 1/4 proper names • 1/6 hyphenated forms • 1/12 misspellings • 1/4 unresolved by investigators (new words, etc.) • How to handle proper names?

  43. Example 2 • Corpus of 22 million words from a variety of genres • Effect of changing lexicon from 50,000 to 60,000 words? • Eliminated 1348 false rejections (words are now included in lexicon) • Created 23 false acceptances (originally misspelled, now occur in lexicon and therefore, treated as correctly spelled.)

  44. Unintentionally correct spellings • Misuse of word: there for their, to for too • Typo: from for form • Quote from Mozart: I’ll see you in five minuets

  45. Issues in detection • Given document as a sequence of words, lexicon as ordered list of words, report all document words not in lexicon, but: • How to handle upper case letters? • How to handle suffixes and prefixes? • What definition of word to use?

  46. Issues in detection (2) • Upper case: Change all to lower case • Handles first word of sentence and proper names that are words: Bob Brown • Confuses: DEC (ok), Dec (abbreviation), dec (misspelling) • Must put back capitalization

  47. Types of errors • From keyboard input, 80% of misspellings • Insertion • Deletion • Substitution, especially nearby keys • Transposition • Few errors occur in first letter • Mostly, length is same or changes by 1

  48. Suggestion Strategies • Words with same first letter first • Order rest by change in length

  49. Types of errors (2) • Improper spacing: run-ons or splits • Significant unsolved problem • Cognitive • recieve for receive; procede for proceed • conspiricy for conspiracy; mispell for misspell • Phonetic • abiss for abyss; nacherly for naturally

  50. Spelling Rules • I before E except after C • Ex, Suc, Pro ceed. All others are cede, except supersede

More Related