1 / 26

Text Annotation Techniques

What is an annotated" text ?. Ordinary Text : (Eg.)This is an ordinary text document.. Annotated text : (Eg.)<html><title>Sample Document </title><body>This is an annotated text document.</body></html>. Key Methods. DTDSGMLHTMLXMLWMLTEI. It is a specification that accompanies an annotate

sherise
Télécharger la présentation

Text Annotation Techniques

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Text Annotation Techniques Bill Bruno Rob LaPlaca

    2. What is an “annotated” text ?

    3. Key Methods DTD SGML HTML XML WML TEI

    4. Document Type Definition

    5. Standard Generalized Markup Language

    6. HTML Hyper Text Markup Language Symbols used to web pages. Markup tells web browser how to display pictures and text. Markups are called elements. Some elements come in pairs.

    7. Basic Annotations in HTML Document Tags HTML, HEAD, BODY Basic Text Structures Headings, Paragraphs, etc. Anchors HREF and Name Images IMG, ALIGN, ALT

    8. Sample HTML Code <html> <title> Sample Document </title> <body> <p> This is a sample HTML document.</p> <p>It illustrates the usage of tags with the actual text.</p> </body> </html>

    9. HTML Specifics There are programs for it, but Word can be used to view it. Not case sensitive. Standardized code, can be viewed with different browsers.

    10. XML Extensible Markup Language It is a flexible way to create common information formats and share both the format and the data on the World Wide Web, intranets, and elsewhere.

    11. XML An element of XML is a start tag, an end tag and data between. <director>Ed Wood</director> Attributes may also be assigned to element by tags. <director=“Hollywood”>Ed Wood</director> XML tags are case sensitive.

    12. Sample XML Code <?xml version="1.0"?> <doc> <burns>Say<quote>goodnight</quote>, Gracie.</burns> <allen><quote>Goodnight, Gracie.</quote></allen> <applause/> </doc>

    13. Sample XML Code 1: <?xml version="1.0"?> 2: <!DOCTYPE PARENT [ 3: <!ELEMENT PARENT (CHILD*)> 4: <!ELEMENT CHILD (MARK?,NAME+)> 5: <!ELEMENT MARK EMPTY> 6: <!ELEMENT NAME (LASTNAME+,FIRSTNAME+)*> 7: <!ELEMENT LASTNAME (#PCDATA)> 8: <!ELEMENT FIRSTNAME (#PCDATA)> 9: <!ATTLIST MARK NUMBER ID #REQUIRED LISTED CDATA #FIXED "yes" TYPE (natural|adopted) "natural"> 10 : <!ENTITY STATEMENT "This is well-formed XML"> 11 : ]>

    14. Sample XML Code <PARENT> &STATEMENT; <CHILD> <MARK NUMBER="1" LISTED="yes" TYPE="natural"/> <NAME> <LASTNAME>child</LASTNAME> <FIRSTNAME>second</FIRSTNAME> </NAME> </CHILD> </PARENT>

    15. Differences Between HTML and XML XML contains tags that describe the data <phoneno> may describe a telephone number. Supports links to multiple documents. A forgotten tag in an XML program makes file unusable unlike HTML where it may be bypassed.

    16. Benefits of XML Meaningful markup. Single approach can accommodate document and data structures and integrates both within documents. Enables transfer of data between applications. Structural similarity to HTML simplifies implementation using traditional web servers/ browser applications CGI and java.

    17. Benefits of XML Files can be processed purely as data - enabling it to be stored or displayed. Files are text & verbose - allows easy debugging It’s license-free, platform independent & well supported.

    19. WML Wireless Markup Language. Allow text portions of web pages to be viewed on cellphones and PDAs. Part of the Wireless Application Protocol. Used to be called HDML Handheld Devices Markup Language.

    20. WML Read in browsers, similar to HTML and XML. WAP devices use a micro browser. Like a regular web browser, but with limited features. HTML could be used, but WML is better for smaller bandwidth. WML uses lesser power to process compared to HTML.

    21. Text Encoding Initiative

    22. Need for a common encoding scheme Till the TEI project was undertaken there has not been any common encoding format for scholarly machine-readable texts. None of the existing encoding schemes has been able to gain acceptance as a standard.

    23. Origin of TEI & factors contributing to it TEI arose out of a planning conference convened by ACH at Vassar College, Poughkeepsie, New York in November 1987 Factor I : More is known now about the problems of text encoding than at the time of previous attempts Factor II : The recently developed Standard Generalized Markup Language (SGML) seemed to be the ideal text-encoding scheme.

    24. Objectives of TEI

    25. Why TEI chose SGML ?

    26. Critique Straightforward presentation More examples would be helpful Research required to fully understand some points

More Related