1 / 23

Digitisation of Newspapers

Digitisation of Newspapers. The South African Experience Patricia Liebetrau IFLA Newspaper Conference, New Delhi, 26-28 February 2010. Introduction. Durban … a multicultural city. National collaborative initiative Creating online resources for education, research and training

olathe
Télécharger la présentation

Digitisation of Newspapers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digitisation of Newspapers The South African Experience Patricia Liebetrau IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  2. Introduction

  3. Durban … a multicultural city

  4. National collaborative initiative • Creating online resources for education, research and training • Make accessible online SA material of high socio-political value • Collated serial literature scattered across collections • Develop local expertise in use of advanced digital technologies • Set standards for digitisation initiatives in SA Digital Innovation South Africa (DISA) IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  5. Identify appropriate collections • Distributed digital production • Gateway to federated digital collections • Develop policies, strategies and guidelines in support of SA initiatives • Comply with international standards • Bridge digital gap between northern and southern hemispheres DISA IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  6. http://www.disa.ukzn.ac.za

  7. Campbell Collections @UKZN • Digital microfilm scanner • Obsolete technology • Preservation of microfilms • Newspapers and MSS on microfilm • Data transfer • Application to DISA

  8. Samples were tested using the following: 1 bit at 300dpi 1 bit at 400dpi 1 bit at 600dpi 8bit greyscale at 300dpi with thresholding at 128 8bit greyscale at 400dpi with thresholding at 128 8bit greyscale at 600dpi with thresholding at 128 Digitising microfilm IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  9. One would have to conclude from this example that perhaps the microfilm was not captured correctly Comparisons Sample 2: Scanned using Minolta MS7000 microfilm scanner at 300dpi 8bit greyscale – microfilm copy looks as though it was bound Sample 1: Scanned on flat bed scanner at 300dpi 8bit greyscale from unbound original IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  10. It would be obvious that the rate of word return from the previous two samples would be far greater in the first image than it would be for the second image Conclusion Some microfilms are better than others – the resulting scan is as good as the original microfilm OCR recognition IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  11. Big no to constitution as elections draw near 11,HOUSANDS of peo-ple have rejected the Government's new constitution under which elections for In-dian and coloured chambers of Parlia-nent are to take place n August. Reports from around the country talk of feverish activity as the biggest issue facing the country nears its climax." The elections, to be held on the 22nd and - 28th of August, is seen as an issue which con-cerns all South Africans. The African com-munity in particular is " leading the call for a boycott of the elec` tions.Mr. Popo Molefe, the national secretary of the United Democratic Front (UDF), said the central issue was the 'denationalisation of the African people'. 'We call on our peo-ole in Eldorado Park, Reiger Park, Acton- ville and Lenasia, to boycott the August elections. 'We call on our peo- ple to refuse to be partners in the crime of Apartheid against the majority of South Africans.' OCR’ed text IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  12. Manual indexing! • Encoded using the international Text Encoding Initiative (TEI) later mapped to Dublin Core (DC) metadata element set • Metadata capture: publisher, place and date of publication at journal/ newspaper level • Indexing of title, author and keywords at article level • xml based • Articles over several pages • English language Indexing IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  13. <teiheader type="journal" status="new" teiform="teiHeader"> <filedescteiform="fileDesc"> <titlestmtteiform="titleStmt"> <title teiform="title">Speak: the voice of the community</title> <title teiform="title">Volume 2 No 3</title> </titlestmt> <publicationstmtteiform="publicationStmt"> <publisher teiform="publisher">DISA Digital Innovation of South Africa</publisher> <pubplaceteiform="pubPlace">Durban, South Africa</pubplace> <date teiform="date">2002</date> <idnoteiform="idno">1684.5188.002.003.Jul1984</idno> </publicationstmt> <sourcedesc default="no" teiform="sourceDesc"> <biblfull default="no" teiform="biblFull"> <titlestmtteiform="titleStmt"> <title teiform="title">Speak: the voice of the community</title> <title teiform="title">Volume 2 No 3<date teiform="date">July 1984</date></title> <editor role="editor" teiform="editor"></editor> </titlestmt> <extent teiform="extent">16 pages</extent> <publicationstmtteiform="publicationStmt"> <publisher teiform="publisher">Speak Community Newspaper Project </publisher> <pubplaceteiform="pubPlace">Johannesburg</pubplace> <date teiform="date">July 1984</date> </publicationstmt> Capturing journal metadata IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  14. Browsing facilities • browse the text images • Searching facilities • full text searching • article title, author and keyword searching • thesaurus • acronyms Readability and advanced searchability Search and browse IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  15. Advanced searchability on all the encoded elements • By using terms from a thesaurus, language usage is standardised • Higher relevance of returned hits • Added intellectual input Indexing results IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  16. Human indexing is time and labour intensive • Training is required • Quality control is needed • Thesaurus management software is essential However … IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  17. African vernacular languages • Translation challenges for a global context • OCR challenges • OCR training for African languages not yet developed • Automated translation not yet possible • Extraction of metadata useful Languages and translations IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  18. Zulu Hindi Language examples IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  19. Rich collections in the vernacular • Poor quality microfilms • Low OCR success rate on microfilms scans • Level of metadata complexity • Minimal manual indexing • Cost of staff time • Service on demand • Lack of national guidelines • Lack of national funding South African newspaper digitisation IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  20. Volume of newspapers and information • Value of digitisation • Rich source of social South Africa history • Vernacular • Teaching, learning and research value • Dedicated newspaper digitisation project • Overcome challenges! Conclusions IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  21. National consultation • National support • Prioritisation • Role of publishers • DISA consultancy Recommendations IFLA Newspaper Conference, New Delhi, 26-28 February 2010

  22. Contact details • Patricia Liebetrau, Director, DISA • Email: liebetraup@ukzn.ac.za • URL: http://www.disa.ukzn.ac.za • This presentation is made available under a Creative Commons Attribution 2.0 South Africalicense.

More Related