410 likes | 545 Vues
Ed Barkmeyer, NIST Ontolog Forum, April, 2007. Information Models as a Basis for Ontologies. Outline. Overview of information modeling Features of “information modeling” Comparison to features of OWL Information modeling methodology Conclusions. History. Linked record models (1968)
E N D
Ed Barkmeyer, NIST Ontolog Forum, April, 2007 Information Modelsas a Basis for Ontologies Next Generation Info Models
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models History • Linked record models (1968) • CODASYL standard (1974), Navigational Data Model (1980) • E.F.Codd: Relational Algebra (1970) • Peter Chen: Entity Attribute Relationship Models (1976) • ISO TR 8002: 1984the Conceptual schema and the information base • 1980s information modeling technologies • IDEF1-X, SDM, NIAM/ORM, SSADM, EXPRESS, etc. • 1990s object modeling technologies (UML) • Frame-based logics (1975-1995) • Description logics (1985-present): DAML, OWL
Next Generation Info Models Differences in Nature • Navigational and relational models • relate data to data • relational normal forms model functions of keys • Information models • relate things (entities) to other things • relate things to information about them • use classifiers to collect properties • Ontologies • relate things to things • relate things to information about them • use information to classify things
Next Generation Info Models Differences in Purpose • Data models • support software implementations of business processes • organize information for access • describe instances • Information models • support sets of business processes • organize information for comprehension • support design of databases and messages • use classifications to describe instances • Ontologies • support retrieval of information using inferencing • organize information for relevance • describe subjects and categories by classifications
Next Generation Info Models Differences in Concept • Information models • universe is things used by the business processes • classification/axioms are as used by the businessbusiness rules, not accepted scientific truth • distinguish conceptual schema = invariants, quantified assertionsfrom the information base = current assertions about individual things • Ontologies • universe is all things that may be encountered in a domain • classification/axioms are accepted truth in the domain • primarily quantified assertions with a few ground facts • distinguished from an information base for some practical uses
Next Generation Info Models Common Ideas • Universe is a set of things of interest • Classification enables understanding of the universe • Axioms (invariants, necessities)but with a different concept of truth • Ground facts = axiomatic truths about instances • conceptual schema is “nearly monotonic”current/transient facts restricted to the information base
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models Information Modeling: Classifiers • Entity type classifies things in the universe • a template for capturing (current) information about things • a model of the state of a thing • identity is distinct from state • domain of properties • Value type classifies information about things • instance is an information unit, a data element • can be a structure of component data elements • identity is state (state is invariant) • only range of properties (its properties proceed from its identity) • Data type represents Value types • instance is a computational data value
Next Generation Info Models Information modeling: Subtypes • Subtype relationships among classifiers • S is a subtype (subclass) of E iffevery s in class S is also an instance of E • multiple supertypes: S is a subtype of E1, ..., En • Exclusion relationships • if t is an instance of E then t is not an instance of D • Covering relationships • E is covered by S1, ..., Sn iff e in E implies there exists at least 1 k such that e is in Sk • Mutually exclusive coverings are “partitions” • “abstract type” = a type that is covered by some set of subtypes
Next Generation Info Models Information Modeling: Class definition • Union (“choice”, “select”) types • Class E is the union of classes F and G and ...E(x) == F(x) OR G(x) • Union types are “abstract” by construction • Intersection • Class E is the intersection of classes F and GE(x) == F(x) AND G(x) • Relative complement • if S is a subtype of E, C is the relative complement iff C = E – S
Next Generation Info Models Classification • Entity classes can represent roles or states of things • no notion of intrinsic properties • models contain intrinsic classifiers, e.g., maximal superclassesbut languages don’t identify them • A thing can be an instance of multiple entity types • the entity types need not be explicitly related • Default relationship among subtypes is “overlaps” • a thing can be instance of both • A thing can change classification over time • thing is instance of class is just part of the state of thing • Most of these concepts not supported by object models
Next Generation Info Models Aside: Value Types • Value type = conceptual classifier for information unit • Categories • name (referencer, supports equal/unequal) • enumerated lists • codes/identifiers taken from registries • strings intended to identify things • quantity • includes numbers and values with “dimensions” • quantitative name (names that support quantitative operations) • ordinal, date, time, time period, temperature, etc. • truth value • text (structured and unstructured) • a body of information interpreted by a specific agent
Next Generation Info Models Information Modeling: Properties • Attributes (data type properties) • domain is entity, range is value • Relationships (object properties, associations) • domain is entity, range is entity • Inverse relationship • same relationship, nominal domain and range reversed • different “reading” (spelling of the relationship name) • Multiplicity/cardinality of attributes and relationships • one entity can have the same property (type) 0, 1, n, unbounded times • distinguish set of the same property fromproperty whose range is a set
Next Generation Info Models Property domains • Domain and range of a property must be a single class • Name of a property implicitly qualified by the domain • Ad hoc supertypes (“union type”)may be created to be domain or range • enumerate the entity types constituting the domain, or • enumerate the entity types constituting the range, or • (rarely) enumerate the value types constituting the range • Mutable and immutable properties • a property P(e, v) is “mutable” if the value v associated with a given e may change over time • P(e,v) is “immutable” if P(e,x) implies x=v over all time
Next Generation Info Models Property Relationships • Property implies property • (there exists v such that P(d,v)) implies (there exists x such that Q(d,x)) • Property excludes property • (there exists v such that P(d,v)) implies NOT (there exists x such that Q(d,x)) • Properties P1, ..., Pn cover entity type • For every instance e of E there exists some i such thatthere exists v such that Pi(e,v)
Next Generation Info Models Relationship Relationships • Relationship implies/subsets relationship (pairwise) • P(x,y) implies Q(x,y) • every pair (x,y) that satisfies P also satisfies Q • Relationship excludes relationship (pairwise) • P(x,y) implies NOT Q(x,y) • Relationship refines/subtypes relationship • property P is a specialization of property Q • every instance of P is an instance of Q • not just implication
Next Generation Info Models Examples • Property implies property • x is an officer of ship S implies there exists officer y such that x reports to y • Property excludes property • x is employee of G implies NOT x is eligible for prize p • Relationship implies/subsets relationship (pairwise) • x is an officer of ship S implies x has cabin on S • Relationship excludes relationship (pairwise) • x is an officer of ship S implies NOT x is passenger on S • Relationship refines/subtypes relationship • x is captain of ship S refines x is officer of ship S
Next Generation Info Models Qualifying Properties • Qualifying property • a property whose existence or value determinesmembership in a given subtype • existence: If there exists y such that Q(d,y) then d is an instance of S • value:If Q(d, ‘red) then d is an instance of S • functional value:Let y = Q(d); if Greater(y, 1) then d is an instance of S • the domain (D) of property Q must be a supertype of SQ may be optional (cardinality 0..<something>) on D
Next Generation Info Models Derived Properties • Derived Property:a property created by “joining” relationships • represented by a “path through the semantic network” • Example: • vehicle and model are entity types • weight is a value type (a quantity) • attribute: model-has-gross-weight(model, weight) • relationship: vehicle-has-model(vehicle, model) • derived property: vehicle-has-gross-weight(vehicle, weight) = vehicle.vehicle-has-model[model].model-has-gross-weight[weight]= { (vehicle, weight) : (exists m) (and vehicle-has-model(vehicle,m) model-has-gross-weight(m,weight)) }
Next Generation Info Models Information Modeling: Identifiers • Identifiers/keys distinguish instances of an entity class • simple key: a property whose inverse is “functional” • for each v in the range, there exists at most 1 d in the domainsuch that P(d,v) • almost always an attribute (value type) • relative uniqueness • property P is unique within property Q • for each p in the range of P and each q in the range of Q, there exists at most 1 d in the domain such that P(d,p) AND Q(d,q) • p is usually a value, and q is usually an entity such thatfor each d there exists exactly 1 q such that Q(d,q) • selection of a key for q gives rise to a “composite key” for dby “concatenating” (making a tuple of) the keys • a key property must apply to all things in the class • a given entity class may have multiple identifier/key properties
Next Generation Info Models Dependencies • Entity type E is “dependent on” property P(e,x) iff(exists e)E(e) implies (exists x)P(e,x) • that is, the e cannot exist unless the x exists • a meta-property of a relationship between instances • sometimes modeled as “dependent on class X” • in IDEF1-X, E is a “weak entity type” and P “supports” E • not all “mandatory” properties are dependencies • dependency is an “intrinsic” property • dependency is an invariant property: the x never changes • Example • course-has-section(course, section) has inversesection-of-course(section, course) • section is dependent on section-of-coursethe section cannot meaningfully exist without the course
Next Generation Info Models Aggregates • Entity type E “aggregates” property P(e,m) iffevery instance e of E is a “collection” and P(e,m) is the relationship of e to its members • aggregate is a metaproperty of E that is based on P • P is a “logical” or “virtual” “part of” relationship • Problem: e is only instantaneously a “set” • the identity of e does not change if a member is deleted • no axiom is associated with this metaproperty • Example: • Entity type Convoy, with property convoy-includes-ship(c,s) • Convoy aggregates convoy-includes-ship • by extension, Convoy “is aggregation of” Ship
Next Generation Info Models Composition • Entity type E “is composed by” properties Pi(e,ci) iff • each instance e of E is constructed from the ci such that Pi(e,ci) • each Pi relates an instance e to one (or more) of its components • for each i, there are n distinct ci such that Pi(e,ci), where n is the minimum cardinality of p(otherwise e is not an instance of E) • for each ci such that Pi(e,ci), if Pi(x,ci) then x = e(a ci belongs to at most one e) • some models make the ci dependent on the inverse of Pi • “composite” is a metaproperty of E that is based on the Pi • each Pi is a “physical” “part of” relationship • Example • entity type Book is composed by book-has-chapter(b, c)
Next Generation Info Models Validity Rules • Validity Rule =arbitrary first-order logic expression involving instances, classifiers and propertiesthat must hold in a “valid” information base • Languages have limitations on expressibility • instance references • existentials • “special functions” • nature of comparisons • NOT inferencing rules • cannot conclude x should be classified as an instance of Econclusion E(x) means invalid information base if NOT E(x)
Next Generation Info Models Aside: Object Modeling • Ad hoc models of state • properties needed for some set of software applications • Object is to design software programs • Object templates (class models) • Attributes, Relationships (associations, pointers) • Superclasses and “inheritance” • Validity rules • ‘Operations’ = actions on the object state • No real association to process • No keys, no qualifiers
Next Generation Info Models Some known Issues • Diverse keys for union types • identity of individuals determined by type and type-specific keys • Variance of cardinality constraints over time/state • can be stated as validity rules (only) • Intermediate states (transactions) • validity rules don’t apply while the info base is in transitionduring certain times in a process • Localization of properties • subtype A always has property P, subtypes B and C never do • model property P local to A? • model optional property P to common supertype S,and use its existence to define (“qualify”) subtype A
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models OWL Features – Classification • Classification • Entity type Class • Value type Class • enumeration Y (all values from) • name N (datatype string) • text N (datatype string) • quantities N (numeric datatypes) • truth values Y • Data type Y • Multiple classification Y • Default overlap Y • Classification change not applicable
Next Generation Info Models OWL Features – Type relationships • Type relationships • subtype Y • multiple supertypes Y • exclusion Y • covering Y • relative complement Complement, Difference • choice/union Y • intersection Y
Next Generation Info Models OWL Features -- Properties • Properties • Attributes Datatype property • Relationships Object property • Inverse Y • Multiplicity/Cardinality Y • Set of property instances Y • Single domain, range Y • Mutable property not applicable
Next Generation Info Models OWL Features -- Metaproperties • Property relationships • Property implies property Y • Property excludes property Y • Properties cover entity type N? • Relationship implies relationship Y • Relationship excludes relationship Y • Relationship refines relationship N (only implies) • Derived properties some • Identifiers functional property • Dependencies N • “Part of”, Aggregates, Composites N
Next Generation Info Models OWL Features – Definitions and Rules • Qualifying properties Class definition • based on presence Y • based on value equal Y • based on function of value N • Validity rules N • N Inferencing rules
Next Generation Info Models OWL as Info Modeling Language • OWL has all the major features • OWL is formally defined • other information modeling languages have formal models ascribed to them after the fact (not standard interpretations) • OWL has formal classification inferencing • but it is not much stronger than languages like ORM • not even strong in “datatype reasoning” • OWL needs: • Identifier/Key metaproperties – identification of individuals • Relative uniqueness rules • Validity rules
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models Information Analysis Approach • Interview • obtain initial information from the experts • Formalize • formally capture what the experts said • Design • reorganize the formal model to provide insight • Review • walk the experts through the designed model • examine one or more use cases • solicit questions, concerns, variants • Revise • correct the design to accommodate the clarifications
Next Generation Info Models Information Analysis Method • Identify the processes to be supported • Identify the principal business classifications of thingsused/modified by the processes • Identify the properties of those things that are used/modified by the processes • Identify types, specializations and generalizationsthat collect uses and properties • Determine type-to-type relationships • Associate properties with the classifications • Determine cardinality constraints • Distinguish entity types from value types • Identify the keys for individuals • Specify validity rules InterviewFormalize Design
Next Generation Info Models Process Modeling • Business Process Modeling • Activities and control flows • Decision points and rules • Process decomposition • Data/Message/Material flows • Information as ‘documents’ • Languages: BPMN, ARIS, METIS, ...
Next Generation Info Models Binding process to information • Actions of process on entities • creating an entity instance • creating a relationship instance between entity instances, usually as a property having a “domain" (or “subject") and a “range" (or “object") • changing one or more properties of an entity instance or relationship • destroying an entity instance • destroying a relationship instance • using a property of an entity instance
Next Generation Info Models Relating Process to Info Requirements • USE defines an information requirement • All other actions define EVENTS • Process models can/should represent impact of events • Use and Events can be aggregated or decomposed • Entity/Class level (UML) • Specific instance • Aspect (a collection of properties) • Property
Next Generation Info Models Conclusions • Emphasis on supported processes as driver • scopes the model in breadth and depth • orthogonal to semantic web concerns • Model for understanding • model must be meaningful to the domain experts • correct formal interpretation is important • implementation is a separate engineering activity • OWL language is strong • formal logic basis • almost all known features (necessary and optional) • identifiers are a critical concern • validity rules will be required