MESMUSES methodology

MESMUSES methodology Lessons learned and open issues… Alain Michard Florence, June 2003

MESMUSES broad vision • Just like several other projects • SW is all about semantic interoperability • Sharing machine-readable terminologies and classification schemes • Science and culture are collective and international • Semantic Web methodology should be highly relevant for managing and sharing scientific and cultural information

Some key S&T issues in the Project • Model : is RDFS / OWL-Lite adequate ? • Schemaauthoring : method and tools needed ! • Metadata : where does it come from ? • Automatic Indexing : experiments with a categorizer

Lives-in Produces Dwelling Person Artefact Owner Schema House Artwork Artist Create Surrogates Lives-in Creates The basic SW model Type : texte imprimé, monographie Auteur(s) : Zola, Émile (1840-1902) Titre(s) : L'assommoir [Texte imprimé] / par Emile Zola Edition : 50e éd. Publication : Paris : G. Charpentier, 1878 Description matérielle : 111-569 p. Notice n° : FRBNF35963044 Real-world entities

Model and Schema Language • Typed attributes are needed • XML-Schema types • Derived types (e.g.: Celsius temperature, Gregorian date, etc.) • Enumerated types, thesauri • Time-stamping • Cardinality constraints • Explicit transitivity of properties (e.g.: geographic inclusion)

Schema authoring issues (1) • Find the right level of abstraction • Is « Glucid » a class or an instance ? • Or is it sometime a class and sometime an instance ? • Avoid the « KR » attitude and practices ! • It’s all about indexing resources with shared terminologies, not about representing human knowledge !

est-constitué-de ISA consomme ISA transforme est-régulé-par est-constitué-de produit Processus Système implique élimine Structure déclenche Processus complexe Processus élémentaire nécessite ISA est-réalisé-par est-documentée-par est-documentée-par Organisme Cellule Appareil Organe Molécule Grande Thématique GTANS est-expliquée-par Tissus Schema authoring issues (2)

Schema authoring issues (3)

Schema authoring issues (4) • Authoring tools are badly needed • Graphical representation of the schema • Zooming on sub-graphs (hierarchies) • Versioning • Consider using UML authoring environment ? • Established methodology and tutorials are needed

Creating Surrogates • Data extraction and fusion from structured sources • R-DB, XML-DB, LDAP • Updating • When ? • Should not create duplicates ! • Detect cross-references • Authority lists • Thesauri • Lexical distance • ???

Automatic Categorization • Automatic indexing • By extracting metadata from resources • By automatic categorization • Define hierarchies of « concepts » inside the schema • Seeding with representative documents • Machine learning to create categorizers • Pros : enriched search functionality • Cons : hierarchies of categories are static • Adding a category may change the categorizers of the others

Bottom-line… • RDFS schema authoring may be more difficult than E-R modelling • Debates on syntactic features are irrelevant • Should be grounded on real-world implementations and testbeds • A new query language (e.g.: RQL) is not high priority • We have not addressed the « logical rules » layer • Semantic Web vs. Community Webs

MESMUSES methodology

MESMUSES methodology

Presentation Transcript

Methodology

Methodology

Methodology

Methodology

Methodology

Methodology

Methodology

Methodology

Methodology

Methodology

METHODOLOGY

Methodology

Methodology

Methodology

Methodology

Methodology

Methodology

METHODOLOGY

Methodology

Methodology

Methodology

Methodology