1 / 51

Excellent XML – systems interoperability at the Wellcome Library

EIUG 11th Conference, Stirling University 1 & 2 September 2005 Margaret Savage-Jones m.savage-jones@wellcome.ac.uk. Excellent XML – systems interoperability at the Wellcome Library. Millennium - Innovative Interfaces Inc. http://catalogue.wellcome.ac.uk Includes online requesting

sidney
Télécharger la présentation

Excellent XML – systems interoperability at the Wellcome Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EIUG 11th Conference, Stirling University 1 & 2 September 2005 Margaret Savage-Jones m.savage-jones@wellcome.ac.uk Excellent XML – systems interoperability at the Wellcome Library

  2. Millennium - Innovative Interfaces Inc. http://catalogue.wellcome.ac.uk Includes online requesting from closed stack since mid 2003 Calm - Archive system – DS Ltd http://archives.wellcome.ac.uk Online access to archive & mss holdings Miro/MedPhoto image system – System Simulation Ltd http://medphoto.wellcome.ac.uk Online access to over 100,000 images, image retrieval & delivery Wellcome Library Systems

  3. Underlying protocol: OAI-PMHOpen Archives Initiative Protocol for Metadata Harvesting - protocol for sharing and harvesting metadata between different OAI-compliant systemsBased on XML and HTTPOne system (CALM or MedPhoto) exposes metadata via an OAI repository. This metadata is harvested by the other system (Millennium) and then loaded

  4. Motivation With a MARC21, ISAD(G) & a bespoke image repository it was a strategic objective to make these systems interoperate Phase II of the Closed Stack project - Western Manuscripts and Archives had to be requestable online by summer 2004 XML Harvester development by Innovative with Michigan State University 2001-02. Wellcome placed an order for XML Harvester in January 2003 With CALM ver 4 it was possible to export EAD XML

  5. Benefits • Online requesting - Western MSS & Archives collections • One circulation system to manage and one set of circ stats • Same interface for all online requests from stack • Archives & manuscripts like other collections • Image sets for library objects displayed in Web OPAC • User can jump from one system to another • No need to rekey user search in other system • Selective harvesting for onward record updating

  6. Example: archive record (from Crick Coll.)

  7. Harvested archive record in Web OPAC

  8. Image of the archive item

  9. Initially XML Harvester dealt only with EAD and needed encodinganalogs for parsing. Developed with Michigan State University (MSU) whose EAD finding aids had MARC encodinganalogs. Harvester parser read these tags. Encodinganalogs are attributes in XML records indicateing field, subfield, indicators etc. in another descriptive encoding system e.g. MARC21 equivalent to EAD tagged element Encoded Archival Description (EAD)

  10. Hierarchical, tree structure with collection and component item level records catalogued in General International Standard Archival Description, ISAD(G) Field export from CALM as default subset EAD DTD had some empty fields – had to export as “DServe Natural” XML which includes field tags. Catalog.xml output with catalog.DTD Archive system metadata

  11. Used small set of 87 XML Arabic records – a local variant of `MASTER’ XML DTD as a pilot to tes XML Harvester Used stylesheets to filter unwanted fields, add encodinganalogs and put 87 .xml files in a web server directory ready to be harvested Pilot – used “Haddad” catalogue XML

  12. Harvester reaches the XML files through port 80. We added a page to the Millennium screens directory listing files with redirections to the web server folder. Harvester opened the page, scanned for `HREF’ strings which directed it to the XML records (file.xml) The XML Harvester parser read tags from encodinganalogs to create MARC21 records, writing to a file for loading Web crawler

  13. Redirection screen <html> <head> <title> Harvester Test</title> </head> <body> <em>Mss Files</em><br> <strong> Sample Screen # 2</strong> <PRE> Test to confirm if harvester can crawl files deposited on wtcalm01 </pre> <A HREF=http://wtcalm01.wellcome.ac.uk/xml/002.xml>002</A> <A HREF=http://wtcalm01.wellcome.ac.uk/xml/83.xml>83</A> <A HREF=http://wtcalm01.wellcome.ac.uk/xml/82.xml>82</A> </body> </html>

  14. - <hyperlink> -<url ENCODINGANALOG=”85607$u”> <xsl:text>http://http://wisdom.welcome.ac.uk/xml/</xsl:text> <xsl:value-of select+”substring-after(/?idno,`WMS Arabic`)”/> <xsl:text>.html</xsl:text> </url> <text ENCODINGANALOG=”85607$z”>View full manuscript record</text> </hyperlink> Example – encodinganalogs for 856

  15. Harvested MARC21 “Haddad” record

  16. Links: to PDF and Request button

  17. Arabic records would be loaded only once but records from CALM would need regular reharvesting/overlay Need a more sophisticated approach than crawling a web directory – XML Harvester can harvest from OAI Repository and use datestamps in OAI toharvest recordscreated, or modified in specifieddate range XSLT could be used to transform records to MARC21 OAI without using encodinganalogs. Lessons

  18. Archives OAI repository • Built on CALM server using freeware University of Illinois Provider service tool (Runs under Windows IIS) • Other Requirements: Microsoft 2000 server Microsoft IIS ver 4 or higher Microsoft ASP Microsoft XML Parser (MSXML) 4.0 Microsoft ActiveX Data objects and ODBC compliant datasource i.e. MS Acces97+ database Firewall access on port 80

  19. Metadata export – chose full CALM record XML DTD (not EAD) Matchpoint – decided to load contents of Calm RefNo field to Millennium 001 indexed in `o’ Also had to consider: Hierarchical record level to harvest Navigation between the two systems Millennium parameters Key decisions

  20. A “Collection” could consist of more than 40 boxes. Must have 1:1 record relationship to make requesting and retrieval work Decision to exclude archives Collection records & use Component level records. Each of these represent 1 item (box, folder, piece) and links to a single bib records with attached item for circulation in Millennium Decision: Record level to harvest

  21. Archivists wanted the archives (CALM) interface to offer the main search route for Western Archives & MSS User is taken from CALM record into Millennium to place their request then back to their CALM record to continue browsing their hit list - – two links were needed Forward: runs cgi script to search Millennium for corresponding bib record Back: 856 with URL link (can be inserted by Harvester) Decision: Navigation

  22. Forward: cgi script runs search of Millennium `o’ index for match on CALM RefNo value http://catalogue.wellcome.ac.uk/search/o?SEARCH=PPCRI%2FA%2F1%2F2%2F8 Back: RefNo PP/CRI/A/1/2/8 built into OAI record URL linking to CALM web front end - RefNo value built into search string http://archives.wellcome.ac.uk/DServe/dserve.exe?& dsqIni= Dserve.ini&dsqApp=Archive&dsqCmd=show.tcl& dsqDb= Catalog&dsqPos=0&dsqSearch=((text)='PP/CRI/A/1/2/8') Example: Links

  23. <?xml version="1.0" encoding="utf-8" ?> - <record> - <DScribeRecord>   <RecordType>Component</RecordType>   <IDENTITY />   <RefNo>MS4385/4404</RefNo>   <AltRefNo>MS.4404</AltRefNo>   <PreviousNumbers />   <Title>Notes and extracts on Chemistry, Volumetric Analysis, (etc.)</Title>   <Date>c. 1865</Date>   <Level>Item</Level>   <Extent>1 volume</Extent>   <UserText5>Bentley House</UserText5>   <Location />   <UserText3>Western MSS series 3 - Requestable</UserText3>   <UserWrapped9 />   <UserText6 />   <UserText7 /> Calm XML export file

  24. Fields tags used: 001, 008, 245, 260, 500, 506, 655, 856 And 949 to make the item. Harvester inserts a 99x tag with load identification code e.g. CALM20040820225128 Found that Component records do not have `author’ which is only held at Collection level – but not a problem `Mock’ bib and item records keyed to Millennium to: - demonstrate navigation & agree content with team - act as a benchmark when harvested records loaded Mapping Calm XML to Marc21

  25. Used XSLT to split the XML single output file into 48,000 component .xml records using the <DescribeRecord> as record delimiter and then transform them to MARC21 OAI records listed to XML Harvester by our OAI repository The OAI repository installed on the CALM staging server uses the University of Illinois Provider service tool - freeware XSLT – eXtensible Style Language Transformation

  26. To cope with `open’ v `closed’ archive collections – new codes were added to archives records and mapped to new Millennium branch codes which would trigger Millcirc rules New branch codes added to Request Rules, Determiner Table, WWWOPTIONS, Locations served New MATTYPE to exclude Western Mss and archives from the Asian Mss scope Millennium parameters

  27. @LOGLEVEL=CONFIG @DBNAME=CALM @URL=http://wtcalm02/oai/oai.asp @CREATEOVERLAYFROMURI=true @9XXMARCTAG=991 @USEOAI=true @DATE=20000606000000 @OAIFROMEMAIL=m.savage-jones@wellcome.ac.uk @SHOWMETADATA=true Config file for archives record harvest

  28. Management interface for XML Harvester

  29. Archive record: Request link to Web OPAC

  30. Harvested archive record in Millennium

  31. Patron login screen to place request

  32. Confirmation of request

  33. Interoperation sought with image system To integrate MedPhoto, a bespoke photo library system, and Millennium for seamless display and ordering of images MedPhoto holds images and records for more than 60,000 items catalogued in Millennium – Iconographic collection, archives & manuscripts, rare books etc. Specific need for Millennium User to see images associated with library objects

  34. Media management interface

  35. @LOGLEVEL=CONFIG @DBNAME=MEDPHOTO @URL=http://aquarius.wellcome.ac.uk:6969/ixbin/hixserv @RECID_MARCTAG=001 @CREATEOVERLAYFROMURI=true @9XXMARCTAG=991 @USEOAI=true @REQUIRE_EADID=false @DATE=20000606000000 @OAIFROMDATE=20050701000000 @OAIUNTILDATE=20050731000000 @OAIFROMEMAIL=m.savage-jones@wellcome.ac.uk @OAISET=bib Config file for image URL harvest

  36. Harvest full “bib” set and load to Millennium populating 962s then each month request list of all new image URLs created since the last harvest with a Millennium .b number in their record. <http://medphoto.wellcome.ac.uk:6969/ixbin/hixserv?verb=ListRecords&meta dataPrefix=marc21&set=bib&from=2005-05-01&until=2005-05-31> (for records in May) <http://medphoto.wellcome.ac.uk:6969/ixbin/hixserv?verb=ListRecords&meta dataPrefix=marc21&set=bib&from=2005-06-01&until=2005-06-30> (for records in June and so on) Selective Harvesting – images

  37. OAI repository built by SSL on MedPhoto server Metadata matchpoint .b bib record no. is common element Between Millennium and MedPhoto XML Harvester selectively requests record set “bib” which all Have .b nos, parses the returned list of MARC21 OAI records and creates a file of MARC records for loading Matches on .b and overlays inserting 962 for each image 962|u holds URL for thumbnail and |e holds `launchpad`URL Harvesting: Image OAI repository

  38. File Name: DONE-MEDPHOTO_20050601192747.marc (411,392 bytes) Offset: 256 Blocks: 1 - 2 LEADER 00403nam a2200085uu 4500 DIRECTORY 001000900000 035001500009 856008000024 962018500104 991002800289 TAGS 1 000 00403nam a2200085uu 4500@ 2 001 L0027751@ 3 035 |a.b12857890@ 4 856 4 1 |uhttp://medphoto.wellcome.ac.uk/ixbin/imageserv?MIDMIRO=L0027751|zView image@ 5 962 |a000:000:URL:b0000000:000000:0:0:0:0:0:0|tImage|vn|uhttp://medphoto.wellcome.ac .uk/ixbin/hixclient.exe?MIROPAC=L0027751|ehttp://medphoto.wellcome.ac.uk/ixbin/i mageserv?MIRO=L0027751@ 6 991 |aMEDPHOTO{228}20050601192747@ MARC21 record ready to load

  39. Example: with |t default

  40. “Launch pad” We saw an opportunity for further integration – used Intermediate screen – URL delivered by MedPhoto repository and loaded to 962 |e User can hotlink from this “launch pad” into image system to register, use a light box, email, download or order the image online from the image system before returning to Web OPAC

  41. What we used • XML Harvester product (III) • OAI repository software • VBScript – for file splitting operation • Instant Saxon (command line XSLT processor) • Microsoft MSXML core services (e.g. ver 5) • Media Management for 962 (or load URLs to 856) • Three OAI-PMH compliant library systems • Shared Record IDs as matchpoints • Some experience of working with stylesheets • Some experience of load tables and record loading

  42. Harvesting legacy catalogues/XML for other Asian MSS e.g.Iskander and Jain project (with Oxford University) Complete testing and batch loading of 60,000 thumbnail and “launchpad” URLs to 962’s Establish routines to manage updates for new, deleted or amended records – utilise OAI-PMH selective harvesting Further automation of routines where practicable Work in progress

  43. Global edit for 962 tag More documentation for XML Harvester Access to underlying harvester parameters e.g. for XSLT processor and XML parser Automation of selective harvesting for maintenance Wish List/Enhancements

  44. Useful links • XML http://www.w3.org/XML • EAD http://www.loc.gov/ead/ • OAI software http://oai.grainger.uiuc.edu/projectinfo.htm • XSLT http://saxon.sourceforge.net/saxon6.4.3/instant.html • http://www.openarchives.org/OAI/openarchivesprotocol.html • http://www.openarchives.org/OAI/2.0/guidelines-marcxml.htm • OAI tutorial http://www.oaiforum.org/tutorial • OAI repository testing http://re.cs.uct.ac.za/

  45. Some example records http://catalogue.wellcome.ac.uk/record=b1465521 http://catalogue.wellcome.ac.uk/record=b1580232http://catalogue.wellcome.ac.uk/record=b1313568 http://catalogue.wellcome.ac.uk/record=b1613633 http://catalogue.wellcome.ac.uk/search/o?SEARCH=PPCRI%2FA%2F1%2F2%2F8

More Related