1 / 9

File Formats in the Context of Archiving

This project meeting discusses the various file formats used in archiving different types of data, including raw binary data, texts, images, and multimedia. It focuses on the challenges of storing mathematics content which consists mostly of text, formulas, diagrams, and some images, with an emphasis on preserving the integrity and structure of text files. The meeting concludes that mark-up formats like XML or TeX are better suited for archiving purposes compared to structured formats like MS Word or PDF.

teddyw
Télécharger la présentation

File Formats in the Context of Archiving

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Formats in the Context of Archiving Dr. Thomas Fischer EMANI – Project Meeting February 14th - 16th, 2002 Springer-Verlag Heidelberg Göttingen State and University Library(SUB) emani@mail.sub.uni-goettingen.de

  2. Archives StoreDifferent Kind of Data ... • archives have to deals with different kind of data • raw binary data • texts • images • multimedia • ... EMANI Project Meeting SUB Göttingen

  3. ... in Different File Formats • binary data: stream of bytes • text: ASCII, other encodings of simple text, formatted text • images: vector or pixel oriented graphics • multimedia: a plethora of different file types for different purposes EMANI Project Meeting SUB Göttingen

  4. Focus on ... • mathematics consists mostly of text, formulas, diagrams, and some images • further contents might be (compiled) programs, interactive simulations etc. • for learned journals the contents is overwhelmingly text with few images EMANI Project Meeting SUB Göttingen

  5. Text! text files usually contains to kinds of information: • textual data providing the contents (words) of the file • structural data containing the information for the presentation of the text EMANI Project Meeting SUB Göttingen

  6. Two Kinds of Problems • loss of structure leads to loss of formatting • loss of text leads to loss of meaning if problems occur with the media or the program that reads the file, some information may be lost the latter is usually considered more serious EMANI Project Meeting SUB Göttingen

  7. Two Types of Text File Formats • structured format (e.g. Microsoft Word, PDF): file consits of text (more or less uninterrupted) and tables (usually at the beginning or the end of the file) that provide additional information, formatting etc. • mark-up format (e.g. HTML, XML, RTF, TeX): file consists of stream of text with formatting information interspersed EMANI Project Meeting SUB Göttingen

  8. For Archiving Purposes • the file format chosen should be readable without the use of specialized programs • the file format should be robust against damage of media and loss of data EMANI Project Meeting SUB Göttingen

  9. Types of Text Format • mark-up languages like XML or TeX store text and formatting together. Text can be reconstructed using any text editor, format probably regained. • structured formats like MS Word or PDF need the dedicated program for proper representation and may or may not allow the extraction of the text contained, depending on the particular situation, usually not visible to the user. Consequence: Mark-up formats are better suited for archiving EMANI Project Meeting SUB Göttingen

More Related