200 likes | 316 Vues
This document introduces Globally Unique Identifiers (GUIDs), essential for biodiversity informatics. GUIDs serve as short, persistent names for complex entities on the web, ensuring each name uniquely identifies one entity. It explores what GUIDs are, their properties (persistent, opaque, resolvable), and how to use them effectively, alongside discussing Life Science Identifiers (LSIDs) and their relevance in identifying biodiversity data. The guide offers examples, pros and cons, and recommendations, crucial for accurate data management within biodiversity organizations.
E N D
Globally Unique Identifiers in Biodiversity Informatics Kevin Richards Landcare Research NZ TDWG 2008
Introduction GUID (Globally Unique IDentifier) • What, Why, Which, How • LSIDs • Issues
What are GUIDs Globally Unique IDentifier • A short name for a complex entity on the web • Each name identifies only one entity • Examples: • UUID eg3E9D6B68-A08C-4F15-BC8A-1265F15D30E2 • DOI egdoi:10.1006/jmbi.1998.2354 • Handle eghdl:123.456/abc • LSID eg urn:lsid:indexfungorum.org:names:213645 • PURLeg http://purl.oclc.org/abc/123
What is a GUID • Properties • Persistent • Opaque • Resolvable, sometimes - useful for locating information about the entity
Why use GUIDs Data at Provider 2 BOOK : “Three little pigs” 2 copies Data at Provider 1 BOOK : “The three little pigs” 3 copies Data Consumer BOOKS: “Three little pigs” … (2) “The three little pigs” … (3)
… but with GUIDs … Data at Provider 2 (ID = P2) BOOK : “Three little pigs” ID (eg ISBN) = A123 2 copies Data at Provider 1 (ID = P1) BOOK : “The three little pigs” ID (eg ISBN) = A123 3 copies Data Consumer BOOKS: ID : A123 : “The three little pigs”… (5) BOOK Titles: ID A123 : Provider P1 : “The three little pigs” ID A123 : Provider P2 : “Three little pigs”
Example in our domain Consensus Id : urn:lsid:compositae.org:names:45240C9B-D419-4B6F-93A5-D0A6DEAB4C81 Name : Anthemis gaudium-solis Velen.
Which GUID • GUID Subgroup Recommendations: • Use LSIDs for identifying biodiversity data • Reuse GUIDs where they already exist • GUID type • Existing assignments • See GUID Report - http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1 Also Canberra LSID Workshop report:http://www.tdwg.org/fileadmin/subgroups/guid/LSID_policy_workshop_Report_Canberra.pdf
What is an LSID? • Life Science IDentifier • Developed by The Object Management Group & W3C • Implemented by the team at IBM • Used for – data objects, datasets, images, files
LSID Formaturn:lsid:bioguid.org:taxon:1122:v1 • Prefix - indicates that this is a URN • URN type - indicates that it’s an LSID-type urn • Authority- the authority who issued the LSID • Namespace- internal to that authority • Object identifier - within that authority • Version - optional
LSID Rules • Data doesn’t change (byte identical) • Always available for resolution • Hand over to another authority if necessary • At least some basic metadata
Pros of LSIDs • Not tied to physical addresses (as URLs are) • Comparison can be done without resolving the ID – eg for cases like “does object a = object b” • Do not require any central registration or central service • Quick to adopt • Encourage thought and planning before they are allocated
Cons of LSIDs However … • Requires DNS SRV record • Requires specialised software to resolve an LSID (not built in to most software) • The restriction - “LSID data cannot change” can be difficult
How • What data/objects to apply Ids to • Decide on • Authority • Namespace • Local ids (new vs existing) • Issue LSIDs • Setup resolver
LSID Code • Current Code Stacks • Open Source (sourceforge.net) • Java, C++, Perl (IBM) • Microsoft .NET (Myself) • TAPIR LSID configuration
LSID Tools • IBM LSID Launchpad • Firefox LSID Browser • LSID Tester (Rod Page) • Web based resolver – http://lsid.tdwg.org/http://lsid.tdwg.org/urn:lsid... to get LSID metadata http://lsid.tdwg.org/summary/urn:lsid... to get summary info of LSID object • Example LSID servers: • Index Fungorum - urn:lsid:indexfungorum.org:names:213649 • IPNI – urn:lsid:ipni.org:names:30000959-2:1.1.2.1 • uBio - urn:lsid:ubio.org:namebank:11815
Issues to think about • Who assigns new LSIDs? • Who maintains LSID resolvers? • What to assign LSIDs to: • Physical or Digital • Granularity • Only objects that need to be resolved / identified externally • Is there any data, or only metadata?
Issues to think about • When to resolve LSIDs • Every time an LSID is encountered, or only when a client requests it? • TDWG standards for metadata • Which ones? • Consistent application
References • LSID Source Forge - http://lsids.sourceforge.net/ • LSID .NET Source Forge - http://sourceforge.net/projects/lsid-dotnet • LSID Tutorial - http://www-128.ibm.com/developerworks/opensource/library/os-lsid/ • LSID Specification - http://www.omg.org/cgi-bin/doc?dtc/04-05-01 • LSID Tester - http://linnaeus.zoology.gla.ac.uk/~rpage/lsid/tester/ • LSID Launchpad - http://www-124.ibm.com/developerworks/downloads/detail.php?group_id=124&what=rele&id=553 • GUID Subgroup - http://www.tdwg.org/activities/guid/ • GUID Subgroup Reports • http://wiki.gbif.org/guidwiki/wikka.php?wakka=GUID2Report&show_comments=1 • http://wiki.tdwg.org/twiki/pub/TIP/TipDocuments/GUID1Report.pdf • Firefox LSID developer site - http://lsid.mozdev.org/