340 likes | 453 Vues
The TELPlus project workshop held on November 22-23, 2007, focused on improving subject access through advanced semantic alignment techniques. Key tasks included evaluating state-of-the-art search engines, integrating services with the TEL portal, and promoting user personalization. The workshop emphasized converting subjects to a standard representation language (SKOS) to address semantic and syntactic heterogeneity. The collaborative efforts aimed to enhance access to resources for users in Bulgaria and Romania, fostering better usability and content discoverability within the TEL framework.
E N D
Controlled Vocabularies in TELPlus Antoine ISAAC Vrije Universiteit Amsterdam EDLProject Workshop 22-23 November 2007
Agenda • TELPlus Context • Improving subject access • 3 sub-tasks • Services for TEL
TELPlus Context • Started October 2007 • Running 27 months • Content WPs • OCRing previously digitised material • Improving the usability of TEL through OAI PMH compliancy • Improving Access • Integrating services with TEL portal • User personalisation services • Extending TEL to Bulgaria & Romania
WP3 – Improving Access • Task 1: Indexing for usability • Review/test state-of-the-art semantic search engines • On content of documents • Task 2: Improving subject access • Task 3: FRBR aggregation, search and browsing • Create/exploit FRBR metadata repositories • Task 4: Focus on users • Focus groups on prototypes
WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Search through collections • Using metadata • In a controlled setting • Paving the way for enhanced usages • Advanced treatments mentioned in TELplus need conceptual structures and links between these structures • E.g. clustering
WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Reference: MACS project • Manually-built semantic equivalences between Rameau, SWD & LCSH headings
WP 3 Task 2 – Improving Subject Access • Improving subject access via semantic alignment between subjects • Reference: MACS project • Manual equivalences between Rameau, SWD, LCSH headings • Here: an experiment on deploying automatic alignment techniques • Determining possible strategies • Assessing feasibility and usefulness • MACS context
WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other
Converting subjects to standard representation language Goal: solving syntactic heterogeneity between vocabularies • Enabling the use of standard tools • E.g. for query (re)formulation • Paving the way for dealing with semantic heterogeneity • Definitions of concepts expressed according to a common model
Converting subjects to standard representation language Approach: Semantic Web and SKOS • Semantic Web • Knowledge objects as web resources (URIs) • Description by linking resources (RDF) • Description using shared formal vocabularies (ontologies) • SKOS • A standard Semantic Web model (ontology) • For knowledge organization systems (thesauri, subject heading lists…)
SKOS: Example skos:ConceptScheme rdf:type skos:Concept http://www.iconclass.nl/ rdf:type skos:inScheme http://www.iconclass.nl/s_11F skos:prefLabel skos:broader “the Virgin Mary”@en “la Vierge Marie”@fr skos:prefLabel http://www.iconclass.nl/s_11
Converting subjects to standard representation language - Process • Getting processable versions from owners • E.g. XML • Analyzing the models • Converting to SKOS
WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other
Vocabulary Alignment • Specifying required alignment format (links) • Type of mapping links: equivalence, broader • Cardinality: one-to-one, one-to-many • Taking application context (TEL) into account
Vocabulary Alignment • Specifying required alignment format (links) • Selecting (& running) alignment techniques/tools • Inspired by semantic web approaches
Vocabulary Alignment Techniques • Similar to ontology alignment problem • Existing approaches for (semi-) automatic ontology alignment • Using techniques from linguistics, computer science, statistics • Problem: performances do not allow 100% automatic alignment • Problem: multilingual case • Some techniques cannot be used
Backgroundknowledge Potential Technique: Using Background Knowledge • Using a shared conceptual reference to find links “Publication” “Calendar” SHL 1 SHL 2
Potential Technique: Statistical Alignment • Object information (book indexing) “Dutch Literature” SHL 1 SHL 2 “Dutch” Dually-indexed books
Vocabulary Alignment • Specifying required alignment format (links) • Selection (& running) of tool/method • Evaluation (& cleaning) • Considering application
Evaluation of Alignments • MACS has produced mappings! • Possible gold standard • But: has MACS produced all mappings? • Which proportion of the SHLs is covered? • Taking into account all indexing strings? • Are MACS mappings the only interesting ones? • “Serendipity” mappings • Concepts that are not equivalent but could bring useful results when added to queries • Compensating for indexing variability
Evaluation of Alignments • Several scenarios for using and evaluating alignments • Concept-based search • Re-indexing • Integration of one SHL into the other • SHL Merging • Free-text search • Navigation
Evaluation of Alignments • Several scenarios for using and evaluating alignments • Concept-based search • Retrieving books indexed by SHL1 using SHL2 concepts • Re-indexing • Integration of one SHL into the other • SHL Merging • Free-text search • Matching user search terms to both SHL1 or SHL2 concepts • Navigation • Browsing several collections using one SHL structure
Evaluation of Alignments • Several settings for a single scenario • Fully automatic reformulation vs assisted reformulation (candidates) • Different evaluation measures • Good mappings vs acceptable ones • Number of candidates for reformulation • Semantic closeness to original query
Vocabulary Alignment • Specifying required alignment format (links) • Selection (& running) of tool/method • Evaluation (& cleaning) • Assessment of the approach • Efforts required, quality, extendibility
WP3.2 Sub-tasks • 3.2.1. Converting the subjects to standard representation language • Semantic web format (SKOS) • 3.2.2. Aligning the vocabularies • Semantic correspondences between subjects • 3.2.3. Deploying the alignment knowledge obtained into TEL framework • E.g. using links to reformulate queries from one subject list to the other
Deploying the alignment knowledge obtained into TEL framework • Observing integration of MACS data into TEL • Conceptual input for alignment requirements • Integration of the obtained alignment in TEL • Assessment of the alignment integration • Technical aspects, usage aspects
Reminder • Alignment is a difficult problem • Application-specific alignment pretty much unexplored in Semantic Web research More a feasibility study than a complete solution to the problem Practical goal: investigate how automatic techniques could help MACS-like initiatives • Manual mapping is labour-intensive
Agenda • TELPlus Context • Improving subject access • 3 sub-tasks • Services for TEL
WP4 – Integrating services with the European Library portal Theo van Veen (KB) Tasks: • Identifying services that are going to give the user the greatest return • Creating new services • Integrating services within TEL …
WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: • Thesaurus and name authority service • Providing terms linked to query terms • Semantic enrichment service • Users can annotate search results with terms • Distance between terms and related terms
WP4 – Some Services Mentioned Preliminary inventory: no official commitment! Services based on controlled vocabularies: • Thesaurus and name authority service • Semantic enrichment service • Distance between terms and related terms Adding more value from controlled vocabularies and alignments between them