130 likes | 270 Vues
Biodiversity literature mark-up Compelling use cases for Natural History Collections. Dr Dimitris Koureas Natural History Museum London. DimitrisKoureas. Workshop on mark-up of biodiversity literature Berlin 10-11 February 2014. Introduction. Significant research effort has been invested
E N D
Biodiversity literature mark-up Compelling use cases for Natural History Collections Dr Dimitris Koureas Natural History Museum London DimitrisKoureas Workshop on mark-up of biodiversity literature Berlin 10-11 February 2014
Introduction Significant research effort has been invested Literature markup could have industry-wide applications with significant impact but… Who are the current stakeholders? Support from Societal actors? What are the direct societal benefits? SO… We need to demonstrate compelling use cases that will engage stakeholders Natural History Museums can be key players Use case 2: Measuring the impact of collections Use case 1: Assisted label transcription > 260 million specimens
Use case 1: Assisted label transcription Legacy literature markup of specimen records can facilitate label transcription process • Digital NH Museums • Digital is NH museums strategic decision • Challenge 1 of in the Science strategy of NHM • Collection digitisation is prioritised in all major museums • NHM allocated c. £750k for the next three years (not including capital expenditure) • Label transcription is important but challenging
Use case 1: Assisted label transcription Different approaches for label transcription Manual transcription of label elements OCR/markup (semi-) automatic curators crowdsourcing Manual transcription of semantic units in the label Hybrid models are currently in use
Use case 1: Assisted label transcription Current approaches for label transcription Suitable label for OCR and markup Not suitable label for OCR vs High resolution Typewritten Well defined structure and semantic units Low resolution Handwritten No proper structure
Use case 1: Assisted label transcription Current approaches for label transcription In-house We can enhance current approaches by introducing Literature assisted transcription Manual or semi-automated Slow and cost ineffective Not suitable for large collections Crowdsourcing Use literature markup to identify specimen records and match against the physical object Unpredictable outcome Data cleaning needed
Use case 1: Assisted label transcription Label transcription: Don’t do the job twice! Most labels have already been transcribed in taxonomic literature Catalogue number Published in 2012 ATHU 3638 Basic OCR output a 1/Li I ) vi5 5, {L I‘O SPXFS \9.E " ‘: 3P~‘’‘fl\ % A HERB. ORPHANIDEUM. 3‘_‘w:a 3 PummI“lift u’ f9 ‘ A ‘-*’ /1i . _ I -}Z_,,_‘;_’:£€ Cg‘?! ~ <‘:.g‘{x Create a link between specimen and literature
Use case 1: Assisted label transcription Label transcription: Don’t do the job twice! Most labels have already been transcribed in taxonomic literature Literature assisted transcription Transcription of specimen labels Is being crowdsourced for the last 250 years Minimum need of data cleaning Specimen data from small collections around the world Specimens labels transcribed several times
Use case 2: measuring NH collections impact Natural History Collections Value through utilisation Value in itself Data extraction Establishing through measuring the scientific and Societal impact of collections preservation Digitisation curation McAlpine (1986): 12.7% of papers used collections & 44.4% made collections Openness Traditional activities of repositories
Use case 2: measuring NH collections impact Specimen metadata born digital literature Collection assessment Specimen identifiers Legacy literature markup specimen citation metrics webservice
Use case 2: measuring NH collections impact Tracking specimen citations in literature can highlight important collections Promote the value of smaller repositories Steer digitisation efforts Help in collection gap analysis Attract more funding
Use case 2: measuring NH collections impact Some concerns: The use of persistent identifiers would help NH collection curators to track the scientific impact of their collections but Tracking specimen records in literature means tracking references to physical objects DOIs could be the easiest way BUT we cannot assign DOIs to physical objects unless museums quickly proceed in creating comprehensive collection data portals and assign UI to all records
Biodiversity literature mark-up: Beyond taxonomic names Compelling use cases for Natural History Collections Thank you Workshop on mark-up of biodiversity literature Berlin 10-11 February 2014 @DimitrisKoureas