1 / 43

Metadata

Metadata. Andy Powell Technical Development and Research UKOLN University of Bath http://www.ukoln.ac.uk/ a.powell@ukoln.ac.uk. Metadata. What is metadata? an introduction The Dublin Core metadata for the Web Metadata management Models for dealing with Web-site metadata

lobo
Télécharger la présentation

Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Andy Powell Technical Development and Research UKOLN University of Bath http://www.ukoln.ac.uk/ a.powell@ukoln.ac.uk

  2. Metadata • What is metadata? • an introduction • The Dublin Core • metadata for the Web • Metadata management • Models for dealing with Web-site metadata • UKOLN metadata projects • overviews (and problems)

  3. What is metadata? • by definition: ..data about data.. ..data which provides information about a resource.. • by example: • title, author, subject classification, shelf mark • digital format, terms and conditions, location (URL)

  4. What is metadata? (2) • by usage: • Resource discovery • Searching, location • Authentication • Quality/rating • Semantic interoperability • Resource management • User interface • Grouping resources for printing • 3-D visualisations

  5. Range of formats Simple Rich Alta Vista NetFirst Lycos Dublin Core IAFA SOIF MARC TEI headers CIMI robot generated hand crafted

  6. Where is metadata? • Embedded within resource • HTML <META> tags • Linked to resource • Remote database • distributed • union (centralised)

  7. Publisher side author webmaster institution Service side search service third party creators Who creates metadata? robot generated hand crafted

  8. Dublin Core • 15 element core metadata set • Primarily intended to aid resource discovery on the Web • Main usage currently embedded into HTML META tags • All elements optional and repeatable • Status? • Agreed syntax for embedding in HTML • Still discussion about the use of some of the elements http://www.ukoln.ac.uk/metadata/resources/dc.html

  9. Dublin Core History • 4 DC meetings • Dublin, Warwick, Dublin, Canberra • (DC-5 - Helsinki coming soon) • Mailing list discussions • meta2@lut.ac.uk • W3C interest • RDF (PICS-NG), MCF • Various projects • Still no significant interest yet from the big search engines :-(

  10. DC Elements - 1 • Title • Subject • intended to promote use of controlled vocabularies but in practice likely to be used for uncontrolled list of keywords • Description • abstract • Creator • Publisher

  11. DC Elements - 2 • Contributor • Date • the date ‘the resource was made available in its present form’. Agreed default format uses subset of ISO 8601, e.g. 1997-09-15 • Type • category of resource - document, image, sound, home page, novel, poem, etc. Still much discussion about the content of this element • Format • MIME type • Identifier

  12. DC Elements - 3 • Source • Language • language of the resource - NOT the metadata • Relation • no guidelines for usage currently • Coverage • separate working party looking at usage • Rights • rights management seen as too complex for DC. This will give a URL to some external information

  13. Simple Example <HTML><HEAD> <TITLE>UKOLN Home Page</TITLE> <META NAME="DC.title” CONTENT="UKOLN: UK Office for Library and Information Networking"> <META NAME="DC.subject" CONTENT="national centre, network information support, library community, awareness, research, information services, public library networking, bibliographic management, distributed library systems, metadata, resource discovery, conferences, lectures, workshops"> <META NAME="DC.description" CONTENT="UKOLN is a national centre for support in network information management in the library and information communities. It provides awareness, research and information services"> <META NAME="DC.creator" CONTENT=”Stark, Isobel"> </HEAD> ...

  14. Element qualifiers • Need to refine meaning in some cases • TYPE Refines meaning of element - sub-divides element namespace • SCHEME Element value taken from external schema, e.g. LCSH for DC.subject, Z39.53 for DC.language • LANGUAGE Language of element value (not of the resource being described!)

  15. Examples - TYPE • Original DC.creator tag <META NAME="DC.creator" CONTENT=”Stark, Isobel"> • Non-personal author <META NAME="DC.creator.corporate" CONTENT=”UKOLN Information Services Group"> • Author’s email address <META NAME="DC.creator.email” CONTENT=”isg@ukoln.ac.uk">

  16. Examples - SCHEME • Library of Congress Subject Heading <META NAME="DC.subject" CONTENT=”(SCHEME=LCSH) Library information networks -- Great Britain"> <META NAME="DC.subject" CONTENT="(SCHEME=LCSH) Information technology -- higher education"> …or… <META NAME="DC.subject" SCHEME=“LCSH” CONTENT=”Library information networks -- Great Britain"> <META NAME="DC.subject" SCHEME=“LCSH” CONTENT="Information technology -- higher education">

  17. Metadata Management Practical issues of using Dublin Core for Internet resource description... • UKOLN metadata system • Requirements • 3 models for metadata management • Implementation at UKOLN

  18. UKOLN metadata system requirements • Easy to use • Work with a variety of methods of creating HTML • Simple migration to future metadata formats • Separate metadata from resource

  19. Pros… Simple May be useful for training and familiarisation Cons… May not be possible with all editors Maintenance problems Easy to make errors Managing Dublin Core (1)HTML Authoring tool Embed by hand using HTML or text editor

  20. DC-dot • A Web based tool for creating Dublin Core <meta> tags • Automatic generation of some tags based on content of the resource • Forms based editing of tags • Cut-and-paste output into HTML • Conversion to other formats… • SOIF, ROADS/WHOIS++, USMARC, GILS... http://www.ukoln.ac.uk/metadata/dcdot/

  21. Pros… Use of Web-site management tools likely to increase Object-oriented database approach Cons… Proprietry formats Early days - too early to evaluate use for metadata yet? Managing Dublin Core (2)Web-site management tool Use Web-site management tool, for example NetObjects Fusion

  22. Pros… Separates metadata from resource Future migration fairly simple Cons… Performance Lack of integration with HTML tools Server specific Managing Dublin Core (3)On the fly generation Hold Dublin Core separately and embed on-the-fly using server-side include (SSI)

  23. UKOLN metadata system (1) • Embed on-the-fly • Apache SSI script • Store metadata using SOIF records • Use MS-Access as tool to create the records • Associate metadata with resource by co-locating them in the Web server filestore

  24. UKOLN metadata system (2) intro.html Apache syntax for calling server-side script <!--#exec cmd="getmeta" --> <html> <head> <title>…</title> <!--#exec cmd="getmeta" --> </head> ... HTML editor intro.html.soif @FILE { http://www.ukoln.ac. ... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel ... } MS-Access Database

  25. UKOLN metadata system (3) MS-Access front end... Filename browser Text boxes Name choosers UKOLN specific metadata

  26. UKOLN metadata system (4) intro.html Web robot <html> <head> <title>…</title> <!--#exec cmd="getmeta" --> </head> ... 1 2 UKOLN Web server 6 intro.html.soif @FILE { http://www.ukoln.ac. ... keywords{13}: xxx, yyy, zzz description{14}: blah blah b author{13}: Stark, Isobel ... } 3 4 SSI script 5

  27. Issues • Performance • Interaction with Web caches • Dublin Core vs Alta Vista style metadata <META NAME=”Description” CONTENT=”blah, blah"> <META NAME="Keywords” CONTENT="xxx, yyy, zzz"> • Granularity • Which pages should have metadata?

  28. What's the point... …of embedding DC <meta> tags? • Alta Vista isn't going to look for them • But, worth doing... • within individual projects • within specific communities (e.g. eLib) • Improve local search facilities • e.g. load SOIF records into a Netscape Catalogue Server • Web-site management benefits

  29. UKOLN Metadata projects • ROADS • Software for Subject Service • DESIRE • European Web indexing • NewsAgent • Current awareness service for Library and Information Staff • BIBLINK • Information flow from publishers to National Bibliographic Agencies

  30. ROADS • Resource Organisation and Discovery in Subject-based Services • Web based tools for Subject Services • SOSIG, ADAM, OMNI, … • Manage and search Internet resource descriptions • ROADS templates (based on IAFA templates) • WHOIS++ http://www.ukoln.ac.uk/roads/

  31. ROADS - WHOIS++ (1) • Simple client-server search and retrieve protocol • Developed originally for ‘white pages’ applications • Offer search facilities across several Subject Services • Distribute a Subject Service across several physical servers • Query routing - centroids and CIP

  32. ROADS - WHOIS++ (2) • Centroid generated by ADAM contains… “you’ll find the string ‘mona’ in the ‘title’ attribute of at least one record in the ADAM database”. SOSIG 2 CGI-based WHOIS++ client 3 OMNI CIP sharing of centroids 1 4 6 5 Web browser ADAM

  33. DESIRE European Web cataloguing • Subject Services • EuroSOSIG (Bristol), EELS (Lund), Arts (Koninklijke Bibliotheek) • Manually created ROADS templates • European Web Index • based on Nordic Web Index (NWI) • Robot generated, all resources • Multiple servers linked with Z39.50 • GILS http://www.nic.surfnet.nl/surfnet/projects/desire/desire.html

  34. DESIRE - current work (1) • Internationalisation of ROADS • Use of robots to: • aid manual cataloguing of resources • build indexes based on list of URLs in a ROADS database • Robot will use embedded Dublin Core if available

  35. DESIRE - current work (2) • Re-design of EWI robot - including: • support for Dublin Core • EWI records GILS-II compatible • Allow users to search across subject services and the EWI using Z39.50 • by converting ROADS records into GILS records • by building a WHOIS++ to Z39.50 gateway http://roads.ukoln.ac.uk/cgi-bin/egwcgi/egwirtcl/targets.egw

  36. NewsAgent Current awareness service for LIS... • Distributed database • servers at LITC, FD, UKOLN - Z39.50 • metadata (and some full-text) • based on DALI • Mixture of content streams • Variety of access methods • Web, e-mail and Z39.50 clients • user-configurable profiles http://www.ukoln.ac.uk/metadata/NewsAgent/

  37. NewsAgent - Content • Journals • Program, VINE, Journal of Librarianship and Information Science • News and briefing material • LA, IIS, UKOLN (Ariadne), BL, LITC • Web pages • E-mail lists and USENET news

  38. NewsAgent - Harvesting • Web crawler • looking for embedded Dublin Core • Limiting the harvest • simple heuristics • use of Dublin Core Relation element • E-mail parser http://www.ukoln.ac.uk/metadata/NewsAgent/dcusage.html

  39. BIBLINK Information flow between publishers • traditional • new - CD-ROM or Web (new to publishing) and National Bibliographic Agencies • British Library, UK • Biblioteca Nacional, Madrid, Spain • Bibliothèque Nationale de France, Paris • Koninklijke Bibliotheek, Den Haag, Netherlands • Nasjonalbiblioteket, Rana, Norway • Universitat Oberta de Catalunya, Barcelona, Spain http://www.ukoln.ac.uk/metadata/BIBLINK/

  40. BIBLINK - research • Scope • Electronic publications suitable for inclusion in National Bibliographies • Metadata • Dublin Core (with extensions!), SGML DTD • Identifiers • ISBN, ISSN, SICI, DOI, URN • Transmission • Simple e-mail or Web crawler • Authentication • MD5 hash assigned to each resource

  41. BIBLINK - data set • Minimum data set • Author, Title, Publisher, Place of Publication, Price, Extent (size), Keywords, Description, Edition/Version, Date of Publication, System Requirements, Format, Language, Terms and Conditions, Frequency, Identifier, Contributor, Checksum • Similar to DC but some don’t fit… <META NAME=“BIBLINK.placePublication” CONTENT=“Bath, UK”> <META NAME=“BIBLINK.frequency” CONTENT=“monthly”> • Issues over conversion to MARC

  42. BIBLINK - demonstrator Publishers • Cataloguing in Publication(CIP) level records Dublin Core E-mail NBAs/National Libraries Dublin Core • Enhanced records optionally returned to publishers UNIMARC • Conversion on to local MARC format using USEMARCON ??MARC

  43. Conclusions • Think about metadata as a ‘process’ • Dublin Core syntax now stable enough to use • Use within projects initially • Choose metadata management model appropriate to your site • Consider long term maintenance and transition to other formats

More Related