280 likes | 355 Vues
Using Pivots to Explore Heterogeneous Collections. A Case Study in Musicology. Daniel Alexander Smith 8 December 2009. musicSpace. http://mspace.fm/projects/musicspace IAM Group, School of Electronics and Computer Science Music, School of Humanities. Outline. How musicologists use data
E N D
Using Pivots to Explore Heterogeneous Collections A Case Study in Musicology Daniel Alexander Smith8 December 2009
musicSpace http://mspace.fm/projects/musicspace • IAM Group, School of Electronics and Computer Science • Music, School of Humanities
Outline • How musicologists use data • Limitations of existing approaches • Our data extraction and integration methodology • Interface walkthrough
musicSpace Tasks • Triage data partners sources • Extract information • Map data sources to schemas/ontologies • Produce interface over aggregated data • Customise interface based on feedback
Intractable research questions • Which scribes have created manuscripts of a composer’s works, and which other composers’ works have they inscribed? • Which poets have had their poems set to music by Schubert, which of these musical settings were only published posthumously, and where can I find recordings of them? • Which electroacoustic works were published within five years of their premier?
Why they are intractable (1) • Need to consult several sources • Metadata from one source cannot be used to guide searches of another source • Solution: Integrate sources
Why they are intractable (2) • They are multi-part queries, and need to be broken down with results collated manually • Requires pen and paper! • Solution: Optimally interactive UI
Why they are intractable (3) • Insufficient granualrity of metadata and/or search option • Solution: Increase granularity
Previous work • Comb-e-chem modelled Chemistry data • We use similar approach • Translated this work to the arts • Musicology modelled using Semantic Web technologies
Musicology Data Sources • Disparate data • How to pull them together and view on demand
Data and Info Management problems • Sources allow searching, but not over everything • Data export (MARC typically) shows extra fields, e.g. characters in opera, document types hidden amongst metadata • Sometimes viewable on original site, but not searchable • Offering extracted metadata already a benefit with one source
Grove Extraction Example • More complicated, as Grove is a full text encyclopaedia • Some digitisation via Grove Music Online • Weak semantic metadata extraction • Thus we performed some data entry
Integration • Domain Expert + Technologist partnership • This will be case for some time now • Technology to best automate tasks to make domain expert’s job less onerous
Metadata mapping • Domain experts devise single schema • Provide mappings of fields in a particular data source to that unified schema • Enables an interface across all sources
Downside • New source comes online with information not covered by unified schema • Have to make changes to all mappings to ensure accurate coverage
New Approach: Pivoting • Marking up a single source, versus pushing all to a single schema • Use a pivot instead to situate metadata for integration • Essentially means that the interface does the heavy lifting of integration • Reduced effort by domain experts
Interface Video • Find a composer • See all copyists of their manuscripts • Choose a copyist and see which other composers that copyist has worked on
Thank youhttp://ecs.soton.ac.uk/projects/musicspace ds@ecs.soton.ac.uk