1 / 24

UNIVERSITY OF MACEDONIA ECONOMIC AND SOCIAL SCIENCES

UNIVERSITY OF MACEDONIA ECONOMIC AND SOCIAL SCIENCES Support and Inclusion of students with disabilities at higher education institutions in Montenegroz. Konstantinos Charitakis, PhD kcharitakis@uom.gr 28 / 8 / 2012. Overview. Accessible Digital Books

keilah
Télécharger la présentation

UNIVERSITY OF MACEDONIA ECONOMIC AND SOCIAL SCIENCES

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNIVERSITY OF MACEDONIA ECONOMIC AND SOCIAL SCIENCES Support and Inclusion of students with disabilities at higher education institutions in Montenegroz Konstantinos Charitakis, PhD kcharitakis@uom.gr 28 / 8 /2012

  2. Overview Accessible Digital Books • Scanning of book • Optical Character Recognition – OCR • Text editing and formatting Konstantinos Charitakis, PhD28 / 8 /2012

  3. STEP 1 – Book Scanning Let us assume that we have a printed book or a hard copy handout. • Scanning equipment - use a scanner device in order to scan the book and produce images of its pages. • Fast scanner • Document feeder • Scanning Analysis - Set scanning analysis at 300 dpi. • Important Notice: scanning analysis higher than 300 dpi will result to a lot of “garbage” during the OCR process. • Output format - save the scanned images in PDF format. • PDF editing - create a single PDF file using a PDF editing software. Konstantinos Charitakis, PhD28 / 8 /2012

  4. Step 2 – Optical Character Recognition (OCR) At this phase we import the scanned pages (images of pages), to an OCR software and convert the printed text to machine-encoded text. • Optical Character Recognition (OCR) is the mechanical or electronic conversion of scanned images of handwritten, typewritten or printed text into machine-encoded text. • OCR software - FineReader11. Among others it is one of the best in the market in terms performance with the Greek language and provided functionalities. • Software training – when needed • When book has strange fonts e.g. handwriting or slim and compressed font styles • Most OCR software packages have the training functionality • Training rules can be saved as Templates and be reused • Time saving process Konstantinos Charitakis, PhD28 / 8 /2012

  5. Step 2 – OCR Output At this stage it is very important to consider for WHOM we want to make the digital book accessible for. • Two choices - There are two choices at this stage that lead to two different outputs (Matrices). • 1st Matrix(Plain Text ONLY) OCR will clean all text formatting (Bold, italics, underlined, size, headers, images, tables, lists etc.) and keep only PLAIN TEXT. • This is very useful when we want to create a digital audio book only for blind people. • 2nd Matrix (keep content’s structure and formatting) OCR will keep Headings, headers, footers, references, page numbering, images, columns, shapes, captions, etc. and create a WORD document with identical text structure as the PDF. • This is very useful for Large Scale Print e.g. Α3 size pages suitable for individuals with other disabilities. • OCR Output Format – Save the OCR output as text file formats e.g. .txt, .doc Konstantinos Charitakis, PhD28 / 8 /2012

  6. Step 3 - Text Editing Process1st Matrix - Plain Text Editing This is text after we have stripped off all text formatting. • Text Editor software – e.g. Microsoft Office Word. • Text restructuring - In order to make the book accessible for blind people we need to recover some of its structure. • OCR s/w has functionalities to keep some structure but even for the smallest thing we will not have the control of how exactly the software is going to do it and at the end you will end up cleaning it anyway. • Experience showed that the procedure to recreate the structure is much faster than to clean the errors/garbage from structure kept by OCR. • Either we clean all or keep all. Konstantinos Charitakis, PhD28 / 8 /2012

  7. Step 3 - Text Editing Process1st Matrix - Plain Text Editing (2) • Text Formatting – start from table of contents. • We look at the table of content of our original book and put appropriate style and formatting (Heading 1, Heading 2 etc.) in the text. • There is functionality in WORD that creates the table of content automatically. We use this functionality and then we delete the old one. • In case we have a document without table of contents we just format its original headings in the resulted text. Konstantinos Charitakis, PhD28 / 8 /2012

  8. Step 3 - Text Editing Process1st Matrix - Plain Text Editing (3) • Page numbering – enter page numbering only in special cases (e.g. in academic documents or handouts) • Page numbering is performed by strictly using WORD’s Insert/ New page functionality where needed. • Not by pressing ENTER or enter sections etc. Konstantinos Charitakis, PhD28 / 8 /2012

  9. Step 3 - Text Editing ProcessOCR - Common Errors Then we start actual text editing by checking and correcting possible errors from the OCR proccess. Common OCR errors - based on their frequency of occurrence • Syllabification – annoying for blind users. • Solution • OCR functionality to handle this before the conversion to PLAIN TEXT during the OCR process. • FIND hyphens followed by and line change OR hyphen followed by space and DELETE. Konstantinos Charitakis, PhD28 / 8 /2012

  10. Step 3 - Text Editing ProcessOCR - Common Errors (2) • Wrong letter recognition - OCR understands a letter as another. • Solution • FIND letter and REPLACE one by one so the correct ones will not be replaced. • Check for errors in relation to the neighbouring letters. • Line changing - OCR enters lines where it shouldn’t. • Solution • FIND paragraph character without a foul stop before and DELETE. Konstantinos Charitakis, PhD28 / 8 /2012

  11. Step 3 - Text Editing ProcessOCR - Common Errors (3) • Continuous ENTER • Solution • FIND two contiguous ENTER characters and DELETE. • Continuous Spaces - are forbidden. • Solution • FIND two contiguous spaces and DELETE. • Tabs - entered usually where lists, tables included. • Solution • FIND Tab and DELETE. • Wherever we want to enter indentation we do it with the indentation functionality in WORD (Format/ Paragraph/ Indentation). Konstantinos Charitakis, PhD28 / 8 /2012

  12. Step 3 - Text Editing ProcessOCR - Common Errors (4) • Spell checking - the most time consuming process. • Solution • Perform spell checking with WORD functionality for the whole document. • However in special occasions we need to include the mistakes as is. Konstantinos Charitakis, PhD28 / 8 /2012

  13. Step 3 - Text Editing ProcessSpecial Characters • Three continuous full stops: … • Three continuous full stopswith space in between: . . . • Two continuous full stops : .. • Two continuous full stopswith space in between: . . . • Opening bracket:( • Closing bracket: ) • Left apostrophe:‘ • Right apostrophe: ’ • Double intonation: ¨ • Opening quotation marks: « • Closing quotation marks : » • Asterisc: * • Paragraph: ^p OR ^13 • Tab: ^t OR ^9 Konstantinos Charitakis, PhD28 / 8 /2012

  14. Step 3 - Text Editing ProcessSpecial Characters (2) • Long dash ( — ): ^+ • Dash ( – ): ^= (dash with space) • Exponent (^): ^^ • Greater than: > • Smaller than: < • Semicolon: · • Line change:^l OR^11 • Column change: ^n OR ^14 • Page change : ^m • Space without interuption (°): ^s  • Hyphen (—): ^+ • Optional hyphen (¬): ^- Konstantinos Charitakis, PhD28 / 8 /2012

  15. Other Issues Other issues that we need to considered depending on the use of the book. • Footnotes - are placed at the end of each page, at the end of each chapter or at the end of the book. • Solution • Follow conventions of putting them all at the end of text. • WORD functionality Insert/ Reference/ Footnote adds a number in the text and in the footnote with continuous numbering. • When the screen reader software comes up to a footnote it goes to the footnote using the hyperlink, it reads it and then returns back. • Place footnotes at the end of the book where needed, so it will be the user’s choice if he want to read it or not. • Page numbering - delete it OR include it only in special document cases as mentioned earlier e.g. Academic, Law books etc. Konstantinos Charitakis, PhD28 / 8 /2012

  16. Other Issues (2) • Bibliography - In academic writings we have to keep it. • We have to follow a convention when we include it. • Using a link might be difficult because the screen reader will go at the end of the book to read it but it will not be able to go back to continue reading. • Citation and references are included in the same way as they are included in the original book. • If there are no footnotes in the original book someone may use this functionality to add references. Konstantinos Charitakis, PhD28 / 8 /2012

  17. Other Issues (3) • Index • There is no need to keep the index since someone can use the FIND functionality of WORD. • However in some cases it is necessary to include Index as for example in Law books. • Images • There are no images in the Plain Text output and they have no mean for the blind people. • Instead use a description of images • Beginning of Image 1 – insert caption – End of image 1 Followed by a text description of the image. • Beginning of description of Image 1 – insert description – End of description of image 1. • Describing an image/picture for blind people is a big chapter that we re not going to analyse it at this phase. Konstantinos Charitakis, PhD28 / 8 /2012

  18. Other Issues (4) • Conventions - It is good practice to include a list of the conventions used in the text at the beginning of the book (e.g. back of the cover page). E.g. for footnotes, table of contents, images etc. • Versioning - Good practice to keep versioning scheme. • Plain Text matrix we create many versions of the same text. • At the end we end up with one WORD file with automated table of contents, page numbering (where applicable), footnotes, bibliography, image descriptions that is already accessible. Konstantinos Charitakis, PhD28 / 8 /2012

  19. File Formats • Text format • Save file in different text file formats e.g. .doc, .txt, .html, .xml • Audio file format • With Text to Speech software it can be recorded to an audio file format e.g. .wav, .mp3 etc. • Braille ready file format • A Braille Ready format (.brl file type) from Word. • Each Braille printer has its own software that converts and save the text in Braille Ready file. - Konstantinos Charitakis, PhD28 / 8 /2012

  20. Step 3 - Text Editing Process2nd Matrix – Structured TextEditing This is the case where we edit text that has preserved its formatting and structure after the OCR process. • Target group - The result of this procedure can be used by individuals with visual impairment or other disabilities. • Allows the users additional options and functionalities - e.g. search, navigation, enlarged text, it gives the opportunity to someone to be able to use a screen reader in order to listen to the text while he reads it. • Contrast and Color management - It might be necessary to adjust the contrast or the color of the fonts and background e.g. white letters in black background etc. depending on each user’s needs. • Error handling - We need to perform error checking and correct them in the same way as it was described earlier in 1st Matrix. Konstantinos Charitakis, PhD28 / 8 /2012

  21. Thank you! Konstantinos Charitakis, PhD28 / 8 /2012

  22. . Konstantinos Charitakis, PhD28 / 8 /2012

  23. . Konstantinos Charitakis, PhD28 / 8 /2012

  24. Konstantinos Charitakis, PhD28 / 8 /2012

More Related