410 likes | 422 Vues
Explore the evolution of information models, from historical linked record models to modern ontology-based systems. Understand the differences in nature, purpose, and concept of information models, databases, and ontologies. Delve into classification, subtypes, class definitions, and classification methodologies in the context of information modeling methodologies.
E N D
Ed Barkmeyer, NIST Ontolog Forum, April, 2007 Information Modelsas a Basis for Ontologies Next Generation Info Models
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models History • Linked record models (1968) • CODASYL standard (1974), Navigational Data Model (1980) • E.F.Codd: Relational Algebra (1970) • Peter Chen: Entity Attribute Relationship Models (1976) • ISO TR 8002: 1984the Conceptual schema and the information base • 1980s information modeling technologies • IDEF1-X, SDM, NIAM/ORM, SSADM, EXPRESS, etc. • 1990s object modeling technologies (UML) • Frame-based logics (1975-1995) • Description logics (1985-present): DAML, OWL
Next Generation Info Models Differences in Nature • Navigational and relational models • relate data to data • relational normal forms model functions of keys • Information models • relate things (entities) to other things • relate things to information about them • use classifiers to collect properties • Ontologies • relate things to things • relate things to information about them • use information to classify things
Next Generation Info Models Differences in Purpose • Data models • support software implementations of business processes • organize information for access • describe instances • Information models • support sets of business processes • organize information for comprehension • support design of databases and messages • use classifications to describe instances • Ontologies • support retrieval of information using inferencing • organize information for relevance • describe subjects and categories by classifications
Next Generation Info Models Differences in Concept • Information models • universe is things used by the business processes • classification/axioms are as used by the businessbusiness rules, not accepted scientific truth • distinguish conceptual schema = invariants, quantified assertionsfrom the information base = current assertions about individual things • Ontologies • universe is all things that may be encountered in a domain • classification/axioms are accepted truth in the domain • primarily quantified assertions with a few ground facts • distinguished from an information base for some practical uses
Next Generation Info Models Common Ideas • Universe is a set of things of interest • Classification enables understanding of the universe • Axioms (invariants, necessities)but with a different concept of truth • Ground facts = axiomatic truths about instances • conceptual schema is “nearly monotonic”current/transient facts restricted to the information base
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models Information Modeling: Classifiers • Entity type classifies things in the universe • a template for capturing (current) information about things • a model of the state of a thing • identity is distinct from state • domain of properties • Value type classifies information about things • instance is an information unit, a data element • can be a structure of component data elements • identity is state (state is invariant) • only range of properties (its properties proceed from its identity) • Data type represents Value types • instance is a computational data value
Next Generation Info Models Information modeling: Subtypes • Subtype relationships among classifiers • S is a subtype (subclass) of E iffevery s in class S is also an instance of E • multiple supertypes: S is a subtype of E1, ..., En • Exclusion relationships • if t is an instance of E then t is not an instance of D • Covering relationships • E is covered by S1, ..., Sn iff e in E implies there exists at least 1 k such that e is in Sk • Mutually exclusive coverings are “partitions” • “abstract type” = a type that is covered by some set of subtypes
Next Generation Info Models Information Modeling: Class definition • Union (“choice”, “select”) types • Class E is the union of classes F and G and ...E(x) == F(x) OR G(x) • Union types are “abstract” by construction • Intersection • Class E is the intersection of classes F and GE(x) == F(x) AND G(x) • Relative complement • if S is a subtype of E, C is the relative complement iff C = E – S
Next Generation Info Models Classification • Entity classes can represent roles or states of things • no notion of intrinsic properties • models contain intrinsic classifiers, e.g., maximal superclassesbut languages don’t identify them • A thing can be an instance of multiple entity types • the entity types need not be explicitly related • Default relationship among subtypes is “overlaps” • a thing can be instance of both • A thing can change classification over time • thing is instance of class is just part of the state of thing • Most of these concepts not supported by object models
Next Generation Info Models Aside: Value Types • Value type = conceptual classifier for information unit • Categories • name (referencer, supports equal/unequal) • enumerated lists • codes/identifiers taken from registries • strings intended to identify things • quantity • includes numbers and values with “dimensions” • quantitative name (names that support quantitative operations) • ordinal, date, time, time period, temperature, etc. • truth value • text (structured and unstructured) • a body of information interpreted by a specific agent
Next Generation Info Models Information Modeling: Properties • Attributes (data type properties) • domain is entity, range is value • Relationships (object properties, associations) • domain is entity, range is entity • Inverse relationship • same relationship, nominal domain and range reversed • different “reading” (spelling of the relationship name) • Multiplicity/cardinality of attributes and relationships • one entity can have the same property (type) 0, 1, n, unbounded times • distinguish set of the same property fromproperty whose range is a set
Next Generation Info Models Property domains • Domain and range of a property must be a single class • Name of a property implicitly qualified by the domain • Ad hoc supertypes (“union type”)may be created to be domain or range • enumerate the entity types constituting the domain, or • enumerate the entity types constituting the range, or • (rarely) enumerate the value types constituting the range • Mutable and immutable properties • a property P(e, v) is “mutable” if the value v associated with a given e may change over time • P(e,v) is “immutable” if P(e,x) implies x=v over all time
Next Generation Info Models Property Relationships • Property implies property • (there exists v such that P(d,v)) implies (there exists x such that Q(d,x)) • Property excludes property • (there exists v such that P(d,v)) implies NOT (there exists x such that Q(d,x)) • Properties P1, ..., Pn cover entity type • For every instance e of E there exists some i such thatthere exists v such that Pi(e,v)
Next Generation Info Models Relationship Relationships • Relationship implies/subsets relationship (pairwise) • P(x,y) implies Q(x,y) • every pair (x,y) that satisfies P also satisfies Q • Relationship excludes relationship (pairwise) • P(x,y) implies NOT Q(x,y) • Relationship refines/subtypes relationship • property P is a specialization of property Q • every instance of P is an instance of Q • not just implication
Next Generation Info Models Examples • Property implies property • x is an officer of ship S implies there exists officer y such that x reports to y • Property excludes property • x is employee of G implies NOT x is eligible for prize p • Relationship implies/subsets relationship (pairwise) • x is an officer of ship S implies x has cabin on S • Relationship excludes relationship (pairwise) • x is an officer of ship S implies NOT x is passenger on S • Relationship refines/subtypes relationship • x is captain of ship S refines x is officer of ship S
Next Generation Info Models Qualifying Properties • Qualifying property • a property whose existence or value determinesmembership in a given subtype • existence: If there exists y such that Q(d,y) then d is an instance of S • value:If Q(d, ‘red) then d is an instance of S • functional value:Let y = Q(d); if Greater(y, 1) then d is an instance of S • the domain (D) of property Q must be a supertype of SQ may be optional (cardinality 0..<something>) on D
Next Generation Info Models Derived Properties • Derived Property:a property created by “joining” relationships • represented by a “path through the semantic network” • Example: • vehicle and model are entity types • weight is a value type (a quantity) • attribute: model-has-gross-weight(model, weight) • relationship: vehicle-has-model(vehicle, model) • derived property: vehicle-has-gross-weight(vehicle, weight) = vehicle.vehicle-has-model[model].model-has-gross-weight[weight]= { (vehicle, weight) : (exists m) (and vehicle-has-model(vehicle,m) model-has-gross-weight(m,weight)) }
Next Generation Info Models Information Modeling: Identifiers • Identifiers/keys distinguish instances of an entity class • simple key: a property whose inverse is “functional” • for each v in the range, there exists at most 1 d in the domainsuch that P(d,v) • almost always an attribute (value type) • relative uniqueness • property P is unique within property Q • for each p in the range of P and each q in the range of Q, there exists at most 1 d in the domain such that P(d,p) AND Q(d,q) • p is usually a value, and q is usually an entity such thatfor each d there exists exactly 1 q such that Q(d,q) • selection of a key for q gives rise to a “composite key” for dby “concatenating” (making a tuple of) the keys • a key property must apply to all things in the class • a given entity class may have multiple identifier/key properties
Next Generation Info Models Dependencies • Entity type E is “dependent on” property P(e,x) iff(exists e)E(e) implies (exists x)P(e,x) • that is, the e cannot exist unless the x exists • a meta-property of a relationship between instances • sometimes modeled as “dependent on class X” • in IDEF1-X, E is a “weak entity type” and P “supports” E • not all “mandatory” properties are dependencies • dependency is an “intrinsic” property • dependency is an invariant property: the x never changes • Example • course-has-section(course, section) has inversesection-of-course(section, course) • section is dependent on section-of-coursethe section cannot meaningfully exist without the course
Next Generation Info Models Aggregates • Entity type E “aggregates” property P(e,m) iffevery instance e of E is a “collection” and P(e,m) is the relationship of e to its members • aggregate is a metaproperty of E that is based on P • P is a “logical” or “virtual” “part of” relationship • Problem: e is only instantaneously a “set” • the identity of e does not change if a member is deleted • no axiom is associated with this metaproperty • Example: • Entity type Convoy, with property convoy-includes-ship(c,s) • Convoy aggregates convoy-includes-ship • by extension, Convoy “is aggregation of” Ship
Next Generation Info Models Composition • Entity type E “is composed by” properties Pi(e,ci) iff • each instance e of E is constructed from the ci such that Pi(e,ci) • each Pi relates an instance e to one (or more) of its components • for each i, there are n distinct ci such that Pi(e,ci), where n is the minimum cardinality of p(otherwise e is not an instance of E) • for each ci such that Pi(e,ci), if Pi(x,ci) then x = e(a ci belongs to at most one e) • some models make the ci dependent on the inverse of Pi • “composite” is a metaproperty of E that is based on the Pi • each Pi is a “physical” “part of” relationship • Example • entity type Book is composed by book-has-chapter(b, c)
Next Generation Info Models Validity Rules • Validity Rule =arbitrary first-order logic expression involving instances, classifiers and propertiesthat must hold in a “valid” information base • Languages have limitations on expressibility • instance references • existentials • “special functions” • nature of comparisons • NOT inferencing rules • cannot conclude x should be classified as an instance of Econclusion E(x) means invalid information base if NOT E(x)
Next Generation Info Models Aside: Object Modeling • Ad hoc models of state • properties needed for some set of software applications • Object is to design software programs • Object templates (class models) • Attributes, Relationships (associations, pointers) • Superclasses and “inheritance” • Validity rules • ‘Operations’ = actions on the object state • No real association to process • No keys, no qualifiers
Next Generation Info Models Some known Issues • Diverse keys for union types • identity of individuals determined by type and type-specific keys • Variance of cardinality constraints over time/state • can be stated as validity rules (only) • Intermediate states (transactions) • validity rules don’t apply while the info base is in transitionduring certain times in a process • Localization of properties • subtype A always has property P, subtypes B and C never do • model property P local to A? • model optional property P to common supertype S,and use its existence to define (“qualify”) subtype A
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models OWL Features – Classification • Classification • Entity type Class • Value type Class • enumeration Y (all values from) • name N (datatype string) • text N (datatype string) • quantities N (numeric datatypes) • truth values Y • Data type Y • Multiple classification Y • Default overlap Y • Classification change not applicable
Next Generation Info Models OWL Features – Type relationships • Type relationships • subtype Y • multiple supertypes Y • exclusion Y • covering Y • relative complement Complement, Difference • choice/union Y • intersection Y
Next Generation Info Models OWL Features -- Properties • Properties • Attributes Datatype property • Relationships Object property • Inverse Y • Multiplicity/Cardinality Y • Set of property instances Y • Single domain, range Y • Mutable property not applicable
Next Generation Info Models OWL Features -- Metaproperties • Property relationships • Property implies property Y • Property excludes property Y • Properties cover entity type N? • Relationship implies relationship Y • Relationship excludes relationship Y • Relationship refines relationship N (only implies) • Derived properties some • Identifiers functional property • Dependencies N • “Part of”, Aggregates, Composites N
Next Generation Info Models OWL Features – Definitions and Rules • Qualifying properties Class definition • based on presence Y • based on value equal Y • based on function of value N • Validity rules N • N Inferencing rules
Next Generation Info Models OWL as Info Modeling Language • OWL has all the major features • OWL is formally defined • other information modeling languages have formal models ascribed to them after the fact (not standard interpretations) • OWL has formal classification inferencing • but it is not much stronger than languages like ORM • not even strong in “datatype reasoning” • OWL needs: • Identifier/Key metaproperties – identification of individuals • Relative uniqueness rules • Validity rules
Next Generation Info Models Outline • Overview of information modeling • Features of “information modeling” • Comparison to features of OWL • Information modeling methodology • Conclusions
Next Generation Info Models Information Analysis Approach • Interview • obtain initial information from the experts • Formalize • formally capture what the experts said • Design • reorganize the formal model to provide insight • Review • walk the experts through the designed model • examine one or more use cases • solicit questions, concerns, variants • Revise • correct the design to accommodate the clarifications
Next Generation Info Models Information Analysis Method • Identify the processes to be supported • Identify the principal business classifications of thingsused/modified by the processes • Identify the properties of those things that are used/modified by the processes • Identify types, specializations and generalizationsthat collect uses and properties • Determine type-to-type relationships • Associate properties with the classifications • Determine cardinality constraints • Distinguish entity types from value types • Identify the keys for individuals • Specify validity rules InterviewFormalize Design
Next Generation Info Models Process Modeling • Business Process Modeling • Activities and control flows • Decision points and rules • Process decomposition • Data/Message/Material flows • Information as ‘documents’ • Languages: BPMN, ARIS, METIS, ...
Next Generation Info Models Binding process to information • Actions of process on entities • creating an entity instance • creating a relationship instance between entity instances, usually as a property having a “domain" (or “subject") and a “range" (or “object") • changing one or more properties of an entity instance or relationship • destroying an entity instance • destroying a relationship instance • using a property of an entity instance
Next Generation Info Models Relating Process to Info Requirements • USE defines an information requirement • All other actions define EVENTS • Process models can/should represent impact of events • Use and Events can be aggregated or decomposed • Entity/Class level (UML) • Specific instance • Aspect (a collection of properties) • Property
Next Generation Info Models Conclusions • Emphasis on supported processes as driver • scopes the model in breadth and depth • orthogonal to semantic web concerns • Model for understanding • model must be meaningful to the domain experts • correct formal interpretation is important • implementation is a separate engineering activity • OWL language is strong • formal logic basis • almost all known features (necessary and optional) • identifiers are a critical concern • validity rules will be required