750 likes | 771 Vues
Learn to model entities with RDF from tabular data, assign distinct URIs, use prefixes, and apply semantic web principles for unique identifiers and property descriptions. Understand RDF/XML and N3 formats, inferencing, consistency, and type propagation rules.
 
                
                E N D
Modeling with RDFRDF and Tabular Data • Consider the Product database relation • Each row describes a single entity • The type of each entity is the table name (Product) • Give each row a distinct URIref • The table gives us a locally unique identifier: the primary key (ID)
To get a globally unique identifier, use the URI for the database itself • Say, http://www.acme.mfg.com • Use the prefix mfg: • For an identifier for a row, concatenate the table name (“Product”) with the unique key • Express it in the mfg: namespace mfg:Product1, mfg:Product2, … • The column labels correspond to properties • For globally unique identifiers, use same namespace as for individuals • For local names, concatenate table name and column label mfg:Product_ModelNo, mfg:Product_Division, … • There’s 1 triple per table cell • Plus 1 triple per row for type info
In N3 (1st and 3rd rows only) @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix mfg: <http://www.acme.mfg.com/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . mfg:Product1 rdf:type mfg:Product . mfg:Product1 mfg:Product_ID "1"^^xsd:integer . mfg:Product1 mfg:Product_ModelNo "ZX-3" . mfg:Product1 mfg:Product_Division "Manufacturing support" . mfg:Product1 mfg:ProductLine "Paper machine" . mfg:Product1 mfg:Product_SKU "FB3524" . mfg:Product3 rdf:type mfg:Product . mfg:Product3 mfg:Product_ID "3"^^xsd:integer . mfg:Product3 mfg:Product_ModelNo "B-1430" . mfg:Product3 mfg:Product_Division "Control engineering" . mfg:Product3 mfg:ProductLine "Feedback line" . mfg:Product3 mfg:Product_SKU "KS4520" .
In RDF/XML (1st and 3rd rows) <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:mfg="http://www.acme.mfg.com/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#"> <mfg:Product rdf:about="http://www.acme.mfg.com/Product1"> <mfg:Product_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">1</mfg:Product_ID> <mfg:Product_ModelNo>ZX-3</mfg:Product_ModelNo> <mfg:Product_Division>Manufacturing support</mfg:Product_Division> <mfg:Product_ProductLine>Paper machine</mfg:ProductLine> <mfg:Product_SKU>FB3524</mfg:Product_SKU> </mfg:Product> <mfg:Product rdf:about="http://www.acme.mfg.com/Product3"> <mfg:Product_ID rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">3</mfg:Product_ID> <mfg:Product_ModelNo>B-1430</mfg:Product_ModelNo> <mfg:Product_Division>Control engineering</mfg:Product_Division> <mfg:Product_ProductLine>Feedback line</mfg:Product_ProductLine> <mfg:Product_SKU>KS4520</mfg:Product_SKU> </mfg:Product> </rdf:RDF>
The triples could be spread across the Web • Use a common base • Some of the property values (objects) could be resources, identified by URIrefs, instead of literals • Especially Product_Division and Product_ProductLine • Foreign keys (unique identifiers in other relations  unique URIrefs)
RDF and Inferencing • In the Semantic Web, consistency constraints on the data can be expressed in the data itself • Lets us model smart data: can describe how they should be used • The Semantic Web stack includes a series of layers on top the RDF layer to describe consistency constraints on the data • The key is the notion of inferencing • Given some stated info, we can determine related info that can be considered as if stated • E.g., given X is married to Y, conclude that Y is married to X • Or conclude that the following is an inconsistent set of statements: X is disjoint from Y, zX, zY • This may seem trivial, but it’s exactly this that’s missing in dumb data
Type Propagation Rule (an Inference Pattern) IfArdfs:subClassOfBandxrdf:typeAthenxrdf:typeB • This statement is the entire definition of subClassOf in RDFS • Consistent with informal notion of broader term in a thesaurus or taxonomy • Our data lives in the Web • To make them more useful, they should behave in a consistent way when combined with data from multiple sources • Basing the meaning of constraint terms on inferencing provides a robust solution to understanding the meaning of novel combinations of terms • E.g., to determine how multiple inheritance works in RDFS, apply the Type Propagation Rule twice: If C is a subclass of A and a subclass of B, then, if x is of type C, then x is of type Aandx is of type B
Asserted triples are those asserted in the original RDF store (possibly merged from multiple sources) • Inferred triples are the additional triples inferred by one of the inference rules governing a given inference engine • An inference engine could infer a triple already asserted • We consider it asserted • No mater how a triple is available, an inference engine treats it in the same way
Example • Given the asserted triples ex:PassengerVehicle rdfs:subClassOf ex:MotorVehicle . ex:Van rdfs:subClassOf ex:MotorVehicle . ex:MiniVan rdfs:subClassOf ex:Van . ex:MiniVan rdfs:subClassOf ex:PassengerVehicle . ex:myVehicle rdf:type exMiniVan ex:yourVehicle rdf:type ex:Van The following are some of the inferred triples ex:myVehicle rdf:type ex:Van ex:myVehicle rdf:type ex:PassengerVehicle ex:yourVehicle rdf:type ex:MotorVehicle But, e.g., the following triple can’t be inferred ex:yourVehicle rdf:type ex:PassengerVehicle
The situation is more subtle when info is merged from multiple sources each with its own inference engine • Most RDF implementations allow new triples to be asserted into the triple store • So an application can assert inferred triples • If those triples are serialized and shared on the Web, another application could merge them with other sources and draw further inferences • Here the distinction between asserted and inferred might be too course to be useful
Suppose 1 info source provides a list of members of the class PassengerVehicle and another provides a list of members of class Van • To find a list of all MotorVehicles , we use the inference rule for subClassOf • We have a merged graph • The inferencing determines that both myVehicle and yourVehicle are members of the class MotorVehicle
Two fundamental components in this data integration example • A model that expresses the relationship between the 2 data sources • Here the model consists of a single class having both the classes to be integrated as subclasses • This is represented by the single concept of subClassOf • The notion of inference • Inferencing applies the model to the 2 data sources to produce a single, integrated answer • When data seem disconnected, it’s often because some apparently simple consistency is conspicuously absent
RDFS and Meaning • Modeling in RDF is about graphs—it creates a graph structure to represent data • Modeling in RDFS is about sets • RDFS provides a way to talk about the vocabulary used in an RDF graph • It lets an info modeler express, relative to particular data modeling and integration needs, • how individuals are related • how properties used to define individuals are related to sets of individuals • etc.
RDFS is like other schema languages in providing info about how we describe data • But it differs from other schema languages in important ways • Consider document modeling systems • A schema language here (XML Schema) lets us express the set of allowed formats for a document • Can then determine (automatically) whether a given document confirms to the schema • A database schema provides type and key info for relations • A relation itself has nothing indicating the meaning of info in a given column or which column is used to index the relation
For OO programming systems, the class structure again helps organize the data • It not only describes the data but also determines, according to the inheritance policy of the language, • what methods are available for a given instance and • how they are implemented • This contrasts with databases and XML: It doesn’t interpret the data but instead gives a systematic way to describe info and transformations for that info • In all cases, the schema is info about the data
The schema in RDF should help provide meaning to the data • The mechanism by which it does this is inference • Inference lets us know more about a set of data than what’s explicitly expressed in the data • It thus explicates the meaning of the original data • The additional info is based in a systematic way on patterns in the original data • RDFS provides detailed axioms that express exactly what inferences can be drawn from given data patterns
Most modeling systems have a clear distinction between the data and its schema • But many modern systems model the schema in the same forma as the data—e.g., • The introspection API of Java represents the object models as objects • An XML Schema document is a well-formed XML document • For RDF, the schema language was defined in RDF from the start • All schema info in RDFS is defined with RDF triples • It’s thus easy to give a formal description of the semantics of RDFS: Give inferences rules that work over patterns of triples
In RDF, everything is expressed in triples • The meaning of asserted triples is expressed in inferred triples • The structures that drive these inferences are also triples • So this process can continue as far as it needs to: The schema info that provides context for info on the Semantic Web can itself be distributed on the Semantic Web
Can explicate the meaning of each RDFS resource by indicating what triples can be inferred in certain circumstances (defined by some pattern of triples) • Illustrate this with one of the most fundamental RDFS terms: rdfs:subClassOf • In RDFS, class membership is given meaning only when there are several classes and a known relation between them • That something’s a member of a class means it’s a member of any superclass • Cf. the Type Propagation Rule • The various rules taken together give a way to express a variety of relations between classes and properties • Such a collection of classes and properties gives a rudimentary definition of a vocabulary for RDF data • I.e., it defines a set of elements and the relationships of those sets to the properties describing the elements
Note that we distinguish two senses of “is” with :Kaneda rdf:type :AllStarPlayer . :AllStarPlayer rdfs:subClassOf :MajorLeaguePlayer . • RDFS provides a meaning for rdfs:subClassOf here: The type info for Kaneda propagates in the expected way, giving :Kaneda rdf:type :MajorLeaguePlayer .
This very simple interpretation of the subclass relationship makes it a workhorse of RDFS modeling • It’s like IF/THEN of programming languages: IF something is a member of the subclass THEN it’s a member of the superclass • But nothing in RDFS corresponds to the ELSE clause • You can’t infer things from the lack of asserted membership
The RDFS Inference Patterns • RDFS consists of a small number of inference patterns • Each provides type info on individuals in a variety of circumstances Relationship Propagation throughrdfs:subPropertyOf • The mechanism RDFS provides for talking about the properties that link individuals is based on an inference pattern defined using resource (keyword) rdfs:subPropertyOf • Terminology includes verbs as well as nouns • Many of the same requirements for mapping nouns from one source to another apply to relationships • Relationship brother is more specific than sibling • If someone is my brother, then he is also my sibling • The rule: For properties P and R, we have If P rdfs:subPropertyOf R, then, if A P B, then infer A R B
Example • A large firm engages a number of people in various capacities and has a variety of ways to administer these relationships • Some are directly employed by the firm while others are contractors • Among the contractors, some are directly contracted to the company on a freelance basis, others on a long-term retainer, and still others contract through an intermediate firm • All these people can be said to work for the firm • To see how to model this in RDFS, first consider the inferences we should be able to draw and under what circumstances • Name the relationships between a person and the firm :contractsTo, :freeLancesTo, :indirectlyContractsTo, :isEmployedBy, :WorksFor
If we assert any of these properties about a person, we’d like to infer that he :worksFor the firm • And there are intermediate conclusions we can draw—e.g., • Both a freelancer and an indirect contractor contract to the firm and indeed work for it • All these relationships are expressed using refs:subPropertyOf • First introduce the properties: :freeLancesTo a rdf:Property . :contractsTo a rdf:Property . :indirectlyContractsTo a rdf:Property . :isEmployedBy a rdf:Property . :worksFor a rdf:Property . • Now the subproperty relationships: :freeLancesTo rdfs:subPropertyOf :contractsTo . (1) :indirectlyContractsTo rdfs:subPropertyOf :contractsTo . (2) :isEmployedBy rdfs:subPropertyOf :worksFor . (3) :contractsTo rdfs:subPropertyOf :worksFor . (4)
To see what inferences can be drawn, we need some instance data • First, some classes: :Firm a rdfs:Class . :Employee a rdfs:Class . • Then some individuals: :TheFirm a :Firm . :Goldman a :Employee . :Spence a :Employee . :Long a :Employee .
:worksFor (3) (4) :isEmployedBy :contractsTo (2) (1) :indirectlyContractsTo :freeLancesTo • Suppose we have the following triples: :Goldman :isEmployedBy :TheFirm . (a) :Spence :freeLancesTo :TheFirm . (b) :Long :indirectlyContractsTo :TheFirm . (c) • We can draw the following inferences: :Goldman :worksFor :TheFirm . From (a) and (3) :Spence :contractsTo :TheFirm . From (b) and (1) :Long :contractsTo :TheFirm . From (c) and (2) :Spence :worksFor :TheFirm . From (b) and (1) & (4) :Long :worksFor :TheFirm . From (c) and (2) & (4)
rdfs:subPropertyOf lets us describe a hierarchy of properties • Whenever a property in the tree holds between 2 entities, so does every property above it
Typing Data by Usage—rdfs:domain and rdfs:range • Besides saying how 2 properties relate to each other, we’d also like to represent how a property is used relative to the defined classes • Recall the forms (where P is a property and D and R classes) P rdfs:domain D . P rdfs:range R . • Informally: Relation P relates values from class D to values from class R • D is the class of the subject of a triple using P, R is the class of the object • The rules: For property P and classes D and R, we have If P rdfs:domain D and x P y, then x rdf:type D . If P rdfs:range R and x P y, then y rdf:type R .
Using P in a way apparently inconsistent with this declaration isn’t an error • Rather, RDFS infers the type info needed to bring P into compliance with its domain and range declarations • In RDFS, can’t assert that a given individual isn’t a member of a given class • (Contrast with OWL) • In fact, no notion of an incorrect or inconsistent inference • So, unlike with XML Schema, an RDF Schema never proclaims an input invalid • It just infers appropriate type info • In this way, RDFS behaves more like a database schema: Declares what joins are possible but makes no statement about the validity of the joined data
Combination of Domain and Range withrdfs:subClassOf • The inference patterns can interact in interesting ways • Can see this with the 3 patterns seen so far • Show the interaction between rdfs:subClassOf and rdfs:domain, starting with an example
Given the following class info :Woman a rdfs:Class . :MarriedWoman a rdfs:Class . :MarriedWoman rdfs:subClassOf :Woman . (1) • And the property info :maidenName a rdf:Property . :maidenName rdfs:domain :MarriedWoman . (2) • And the triple :Karen :maidenName "Stephens" . • We infer by (2) :Karen a :MarriedWoman . • But also, by (1) and (2), :Karen a :Woman .
In general, for any resource X, Given X :maidenName Y, infer X a :Woman • But this is exactly the definition of domain, i.e., we just saw that :maidenName rdfs:domain :Woman . • Until now, we’ve applied the inference pattern when a triple using rdfs:domain was asserted or inferred • Now we’re inferring an rdfs:domain triple when we prove that the inference pattern holds • I.e., we view the inference patter as the definition of what it means for rdfs:domain to hold • Generalizing, we have the new inference pattern: Where P is a property and D and C classes, If P rdfs:domain D and D rdfs:subClassOf C, then P rdfs:domain C
The same conclusion holds for rdfs:range, by the same argument • So, • whenever we have domain or range info about a property and a triple involving that property, • we can draw conclusions about the type of any element based just on that triple • The definitions of rdfs:domain and rdfs:range are the most common problem areas in RDFS for modelers experienced in another data modeling paradigm
Relation to OOP • There’s an “obvious” correspondence • between an rdfs:Class and a class in OOP and • between an RDFS property and an OOP variable, whence the assertion P rdfs:domain C . corresponds to the definition of the variable corresponding to P being defined at class C • From this “obvious” mapping comes an expectation that these definitions inherit in the same way as variable definitions do in OOP • But in RDFS there’s no notion of inheritance per se • The only mechanism is inference
The RDFS inference rule closest to the OO notion of inheritance is the Subclass Propagation Rule • But, using the “obvious” interpretation, we assert that the variable maidenName is defined in MariedWoman • Then infer that it is in the class higher in the tree, Woman • From the OOP point of view, this is like inheritance up the tree • The fallacy in the conclusion comes from the “obvious” mapping of rdfs:domain as defining a variable relative to a class • It’s never accurate in the Semantic Web to say that a property is “defined for a class” • A property is defined independently of any class • The RDFS relations specify which inferences can be correctly made about it in particular contexts
RDFS Modeling Combinations and Patterns • The effect of the few, simple RDFS inference rules can be quite subtle in the context of shared info in the Semantic Web Set Intersection • No explicit RDFS modeling construct for set intersection (or union) • But, when we want to model intersection (or union), we needn’t always have to model it explicitly • Often need only certain inferences supported by these notions • Sometimes these inferences are available through design patterns that combine the RDFS primitives in specific ways
Regarding intersection, an inference we might want to draw is that If x is in C, then it is in both A and B • The relationship being expressed is CAB • This inference is supported by making C a common subclass of A & B: C rdfs:subClassOf A . C rdfs:subClassOf B . • Given the triple x a C . by the Type Propagation Rule, we infer (as desired) the triples x a A . x a B . • Can’t draw the inference in the other direction • From membership in A and B, we can’t infer membership in C • We can’t express ABC
Example: Hospital Skills • Suppose we’re describing hospital staff • A surgeon is a member of the hospital staff and a qualified physician • SurgeonStaffPhysician • But not every staff physician is a surgeon—the inclusion goes one way • From our assertions, we want to be able to infer, given Kildare is a surgeon, he’s also a staff member and a physician • Assert :Surgeon rdfs:subClassOf :Staff . :Surgeon rdfs:subClassOf :Physician . :Kildare a :Surgeon . • Infer by Type Propagation Rule :Kildare a :Staff . :Kildare a :Physician . • Can’t make the inference the other way • A psychiatrist is also both a staff member and a physician
Property Intersection • Even though it’s unfamiliar to think of a property as a set, we can still use intersection and union to describe the functionality supported for properties • RDFS again can’t support these exactly • But we can approximate these notions with proper use of refs:subPropertyOf • One inference is based on one property being an intersection of 2 others • PRS • Given x P y, we want to infer both x R y and x S y
Example: Patients in Hospital Rooms • When a patient is logged into a room, we know that • He is on the duty roster for that room • His insurance is billed for that room • Assert :loggedIn rdfs:subPropertyOf :billedFor . :loggedIn rdfs:subPropertyOf :assignedTo . :Marcus :loggedIn :Room101 . • Infer :Marcus :billedFor :Room101 . :Marcus :assignedTo :Room101 . • Can’t make the inference in the other direction • If Marcus is billed for Room 101 and assigned to Room 101, no RDFS rules apply
Set Union • Express ABC by making C a common superclass of A and B: A rdfs:subClassOf C . B rdf:subClassOf C . • Then x a A or x a B implies x a C Summary • CAB by making C a subclass of A and of B • CAB by making both A and B subclasses of C
Example: All-Stars • In determining candidates for the All Stars, a league’s rules could state that • MVPs are candidates • Top scorers are candidates • Assert :MVP rdfs:subClassOf :AllStarCandidate . :TopScorer rdfs:subClassOf :AllStarCandidate . • Given :Reilly a :MVP . :Kaneda a :TopScorer . we may infer :Reilly a :AllStarCandidate . :Kaneda a :AllStarCandidate . • We can’t draw the inference the other way • Given someone is a candidate, can’t infer he’s an MVP or top scorer
Property Union • Use rdfs:subPropertyOf to combine properties from diverse sources as we use rdfs:subClassOf to combine classes as a union • If 2 sources use properties P and Q similarly, a single amalgamated property R can be defined as P rdfs:subPropertyOf R . Q rdfs:subPropertyOf R . • Then (for resources x and y), from x P y or x Q y infer x R y
Example: Merging Library Records • One library uses property borrows to indicate that a patron has borrowed a book • Another library uses checkedOut for the same relationship • If we’re sure the 2 mean exactly the same, we make them equivalent: Library1:borrows rdfs:subPropertyOf Library2:checkedOut . Library2:checkedOut rdfs:subPropertyOf Library1:borrows . • Any relationship expressed by one library is inferred to hold for the other
If we are not sure they’re exactly the same but have an application that wants to treat them the same, use the Union pattern to create a common super-property: Library1:borrows rdfs:subPropertyOf :hasPossession . Library2:checkedOut rdfs:subPropertyOf :hasPossession . • All patrons & books from both libraries are related by :hasProperty • Info from both sources is merged
Property Transfer • In modeling info from multiple sources, a common requirement is: If 2 entities are related by some relationship in one source, the same entities should be related by a corresponding relationship in the other source • Given property P in one source and property Q in another, say that all uses of P are to be considered uses of Q with P rdfs:subPropertyOf Q . • Now, from x P y infer x Q y • Perhaps strange to have a design pattern consisting of a single triple • But this use of rdfs:subPropertyOf is so pervasive it merits being distinguished
Example: Terminology Reconciliation • A growing number of standard info representation schemes are being published in RDFS • Info developed in advance of a standard must be retargeted to be compliant • E.g., suppose a given legacy bibliography system uses the term author for the creator of a book • This is dc:creator • Make the system’s data conform to Dublin Core with a single triple :author rdfs:subPropertyOf dc:creator . • Now anyone for whom the author property has been defined has the same value defined for dc:creator
The work is done by the RDFS inference engine • Not by an off-line editing process • So legacy applications using the author property can carry on as before • Newer, DC-compliant applications can use the inferred data in a standard way
Challenges • Look at modeling scenarios that can be addressed with the patterns we’ve seen Term Reconciliation • Resolve terms used by different agents who want to use their descriptions together in a federated application Challenge 2 • Enforce the assertion that any member of one class is automatically treated as a member of another