Adding Value to Open Scholarly Content
E N D
Presentation Transcript
Adding Value to Open Scholarly Content How Services and Search Expose The Value of the Perseus Digital Library
What is Perseus? • Mission • To increase accessibility to and interest in the humanities. • What makes our content interesting? • We give this content away yet still maintain a user base that finds value in its offerings.
Perseus’ Static & Dynamic Services • Perseus gives away its static content.. • Perseus also makes its content dynamically accessible. • Allows for interconnections among Perseus’ objects. • This allows us to build up a network of associations between primary and secondary sources of information. • Named Entity Extraction • Morphological Analysis • The more content we have, the more associations between objects we can offer.
Text Services • Increasing the value of Perseus’ texts • The concepts behind the Canonical Text Services protocol (CTS) • CTS will allow us to interconnect our objects. • Intra-connecting: Making associations within our own content • Inter-connecting: Making associations between our content and external services/content. • The role of search • In a time when “scholarly content is increasingly being seen as a public resource,” what is the role of search engines in conceiving and delivering texts?
Goals • By the end of the talk we will see: • A Service for Referencing Text • CTS • The Value of Associations • CTS URNs, a syntax for intra and inter-connecting texts • Perseus’ other sources of value • Perseus’ logical architecture
Concepts Behind CTS: Author to Edition • Hierarchical Ontology of Text Organization • An author’s works • Get me all works by Julius Caesar • urn:cts:latinLit:stoa0069 • A particular work of an author • Get me Caesar’s The Gallic War • urn:cts:latinLit:stoa0069:stoa002 • An edition or translation of a work • Get me a specific English translation of Caesar’s The Gallic War • urn:cts:latinLit:stoa0069:stoa002:1999_02_0001
Concepts Behind CTS:Edition to Character • A logical component of text from an edition or translation in terms of its citation scheme • Get me Book 1, Chapter 1 of Caesar’s Gallic War from this English translation • urn:cts:latinLit:stoa0069:stoa002:1999_02_0001:1.1 • A paragraph, quotation, or single character within a text • “All Gaul is divided into three parts” • urn:cts:latinLit:stoa0069:stoa002:1999_02_0001:1.25:All:0-parts:0 • A range of text • Give me Book 1, Chapter 37 through Book 2, Chapter 5 of Caesar’s Gallic War • urn:cts:latinLit:stoa0069:stoa002:1.37-2.5
CTS And Content Delivery • CTS URNs can be thought of as a syntax for a “new and emerging content delivery mechanism” • Through URNs we can “break down the content into component parts, each of which can be manipulated…separately” • Although CTS adds value to the raw data/content we give away. • Logical referencing • Enables associations
Intra-Connecting Content:The Role of Index Services • Associations between data add value. • Google Page Rank • Index services let us construct associations with semantic precision. • Named entity disambiguation • Citations • Morphological Information • Associations add context and increase understanding of the underlying content. • Occurrence of Gaul in a text to its definition • Occurrence of Gaul on this slide to previous examples.
Inter-Connecting Content:The Role of Search Engines • Perseus can increase the value of its content even further by connecting its highly-structured data with external services (like search engines) providing less-structured data • We’ve seen this idea before… • Google Earth: Search and display results • Longitude and latitude (Geographic coordinates) • CTS-aware searching: Search and display results • CTS URNs (textual coordinates)
CTS URNs and Search • What Perseus is doing now (experimental): • Using Google Base and CTS-URNs to find Perseus’ highly-structured content with semantic precision. • Search texts at any tier of the hierarchical structure • expanding or truncate the URN. • Examples: • Get me all works by Julius Caesar visible to this search. • Get me Caesar’s The Gallic War • Get me a Perseus-edition English translation of Caesar’s Gallic War • Get me Book 1, Chapter 1 of Caesar’s The Gallic War from the English Translation
CTS as a Value-Added Service • We have a standard mechanism for referencing and retrieving texts • We have a mechanism for tracking our audience. • A syntax for aggregation of content (Shore) • A well-defined API implementing an open standard (Shore) • Handles multi-lingual content • Provides a syntax for datasets of aligned texts. • A notation for semantically precise associations.
Perseus’ Logical Architecture:Identifying Sources of Value • DATA LAYER: TEI-XML texts, databases, raw data. Perseus gives away this raw data under the Creative Commons License. • Perseus as a data source to the community • Perseus understands how to create this data and can help others to do so as well • DOMAIN LAYER: The objects that encapsulate the data and add a set of behaviors. • The knowledge and experience gained while creating this layer, and coming to understand the objects of the domain. • Working in the domain of Classical texts provides Perseus with a unique perspective on the nature of text that others may find useful.
Perseus’ Logical Architecture: Identifying Sources of Value • SERVICE LAYER: The service layer provides an API implementing a series of protocols for each of the types of data Perseus serves. • Others are free to repurpose Perseus’ content through an API that encodes domain knowledge. • The community using the API becomes a source of information and value • DISPLAY LAYER: The user interface. Think widgets, HTML web pages, PDFs, etc. • Convenience & Ease of Use • Expertise: The UI reflects the knowledge about the content gained when building the other layers.
Closing Points • The idea: • Perseus can give away its static data because it adds value through providing semantically rich associations, adding context to the content. • An Example Service: • The Canonical Text Services’ protocol offers a new way to conceive of, reference, and deliver texts • Associations Add Value: • Perseus’ value stems from these associations, the value is not inherent in the raw data, but comes from creating relationships among the data. • Search engines give Perseus the opportunity to create semantically precise associations from less-structured, external content to highly-structured Perseus content. This is accomplished through augmenting search queries with the ‘textual coordinates’ of the CTS URN. • Perseus Offers More Than Services: • In giving away our raw data, we hope to encourage others to create their own associations, increasing our value as a data provider and as service developers. • For the majority of users however, our value stems from providing highly structured texts with rich associations in a simple user interface.
Resources • People • Blossom, John. “Shoreviews. Content Industry Outlook 2007: Reality Checks.” Shore Communications Inc. 8 Feb. 2007. • Crane, Gregory. Conversations and being in his general vicinity. 2004-Present. • Interconnecting primary and secondary sources • The Perseus Digital Library • Smith, Neel. Conversations and being in his general vicinity. 2000-Present. • “An Architecture for a distributed library incorporating open-source critical editions.” OSCE position paper. • Weaver, Michael. Conversations and being in his general vicinity. 1982-Present. • Slide layout based on his HTML design • Logical layers of an application as a model for business processes (www.dynamicinsight.com) • Relevant Links • CTS: http://digitalclassicist.xwiki.com/xwiki/bin/view/osce/paperReg • Google Base: http://base.google.com • Perseus: http://www.perseus.tufts.edu/