510 likes | 598 Vues
Tracking Referents (based on OIC, December 1, 2006). Barry SMITH and Werner CEUSTERS Center of Excellence in Bioinformatics and Life Sciences University at Buffalo, NY, USA http://www.org.buffalo.edu/RTU. Representational artifacts.
E N D
Tracking Referents(based on OIC, December 1, 2006) Barry SMITH and Werner CEUSTERS Center of Excellence in Bioinformatics and Life Sciences University at Buffalo, NY, USA http://www.org.buffalo.edu/RTU
Representational artifacts classified according to the sort of entities they are about
A realist view of the world • The world consists of entities that can be divided according to three dichotomies • entities that are • Either particulars or universals; • Either occurrents or continuants; • Either dependent or independent; • together with relations between these entities • <particular , universal> e.g. is-instance-of, • <particular , particular> e.g. is-member-of • <universal , universal> e.g. is_a (is-subtype-of)
airport air plane philosopher Enola Gay Barry Smith George Bush JFK instances/particulars A realist view of the world (1) universals/types president instance of
t meeting flying occurrents A realist view of the world (2) Enola Gay Barry Smith George Bush JFK continuants
A realist view of the world (3) universals child adult philosopher president t Instance-at t Barry Smith George Bush particulars
Inadequate representational units • “JFK” “Enola Gay” • “Barry Smith” “George Bush”
Proposed Solution: Referent TrackingNow! That should clear up a few things around here ! • Purpose: • explicitreference to the concrete individual entities relevant to the accurate description of a scene Ceusters W, Smith B. Strategies for Referent Tracking in Electronic Health Records. J Biomed Inform. 2006 Jun;39(3):362-78.
78 235 5678 321 322 666 427 Numbers instead of words • Method: • Introduce an Instance Unique Identifier(IUI) for each relevant particular (individual) entity
Essentials of Referent Tracking • generate of universally unique identifiers; • decide what particulars should receive a IUI; • finding out whether or not a particular has already been assigned a IUI (each particular should receive maximally one IUI); • using IUIs to make statements; • determining the truth values of statements in which IUIs are used; • correcting errors in the assignment of IUIs.
IUI generation • Universally Unique IDs: • recently standardized through ISO/IEC 9834-8:2004, • specifies format and generation rules enabling users to produce 128-bit identifiers that are either guaranteed or have a high probability of being globally unique • Meaningless strings • Central management or certification not needed to guarantee uniqueness • (But use as IUI requires this)
IUI assignment • = an act carried out by the first ‘cognitive agent’who recognizes the need to acknowledge the existence of a particular it has information about by labelingit with a IUI. • ‘cognitive agent’: • A person; • An organisation; • A device or software agent, e.g. • Bank note printer • Image analysis software
Criteria for IUI assignment (1) • Different for continuants and for occurrents • The continuant is in front of you, you can see it, photograph it • The photograph gets a IUI; your act (occurrent) of taking the photo gets a IUI • The occurrent occurs in your presence, you can make a video • The video gets a IUI; your act (occurrent) of taking the video gets a IUI • When assigning a IUI you may not know exactly what the particular is (which type it instantiates)
Criteria for IUI assignment (2) • The particular’s existence ‘may not already have been determined as the existence of something else’: • Morning star and evening star • Himalaya • 2 observers not knowing they observed the same thing • May not have already been assigned a IUI. • It must be relevant to do so: • Personal decision, (scientific) community guideline, ... • Possibilities offered by the EHR system • If a IUI has been assigned by somebody, everybody else making statements about the particular should use it
Assertion of assignments • IUI assignment is an act whose \execution has to be asserted in the IUI-repository: • <da, Ai, td> • da IUI of the registering agent • Ai the assertion of the assignment <pa, pp, tap, c> • pa IUI of the author of the assertion • pp IUI of the particular • tap time of the assignment • c optional description for identification • td time of registering Ai in the IUI-repository • Neither td or tap give any information about when #pp started to exist. This might be asserted in statements providing information about #pp .
PTP statements - particular to particular • ordered sextuples of the form <sa, ta, r, o, P, tr> sa is the IUI of the author of the statement, ta a reference to the time when the statement is made, r a reference to a relationship (available in o) obtaining between the particulars referred to in P, o a reference to the ontology from which r is taken, P an ordered list of IUIs referring to the particulars between which r obtains, and, tra reference to the time at which the relationship obtains. • P contains as many IUIs as required by the arity of r. In most cases, P will be an ordered pair such that r obtains between the particular represented by the first IUI and the one referred to by the second IUI. • As with A statements, these statements must also be accompanied by a meta-statement capturing when the sextuple became available to the referent tracking system.
PTCL statements – particular to class <sa, ta, inst, o, p, cl, tr> sa is the IUI of the author of the statement, ta a reference to the time when the statement is made, inst a reference to an instance relationship available in o obtaining between p and cl, o a reference to the ontology from which inst and cl are taken, p the IUI referring to the particular whose inst relationship with cl is asserted, cl the class in o to which p enjoys the inst relationship, and, tra reference to the time at which the relationship obtains.
Other Advantages • mapping as by-product of tracking • Descriptions about the same particular using different ontologies/concept-based systems • Quality control of ontologies and concept-based systems • Systematic “inconsistent” descriptions in or cross terminologies may indicate poor definition of the respective terms
Accept that everything may change: • changes in the underlying reality: • Particulars and universals come and go • changes in our (scientific) understanding: • The plant Vulcan does not exist • reassessments of what is considered to be relevant for inclusion (notion of purpose). • encoding mistakes introduced during data entry or ontology development.
t U1 U2 Reality p3 Reality versus beliefs, both in evolution IUI-#3 O-#0 O-#2 Belief O-#1 = “denotes” = what constitutes the meaning of representational units …. Therefore: O-#0 is meaningless
An “optimal” representational artifact (2) • Each representational unit in such a representational artifact would designate • (1) a single portion of reality (POR), which is • (2) relevant to its purposes and such that • (3) the authors intended to use this representational unit to designate this POR, and • (4) there would be no PORs objectively relevant to these purposes that are not referred to in the representational artifact.
Sources of error • assertion errors: sources may be in error as to what is the case in their target domain; • relevance errors: sources and analysts may be in error as to what is objectively relevant to a given purpose; • encoding errors: they may not successfully encode their underlying cognitive representations, so that particular representational units fail to point to the intended PORs.
Key requirement for updating Any change in an ontology or data repository should be associated with the reason for that change to be able to assess later what kind of mistake has been made !
Example: a person’s gender • In John Smith’s EHR: • At t1: “male” at t2: “female” • What are the possibilities ? • Change in reality: • transgender surgery • change in legal self-identification • Change in understanding: it was female from the very beginning but interpreted wrongly • Correction of data entry mistake • (was understood as male, but wrongly transcribed)
A realism-based metric for data quality • Must be able to deal with a variety of problems by which matching endeavors thus far have been affected • different authors may have different though still veridical views on the same portion of reality, • authors may make mistakes, • when interpreting reality, or • when formulating their interpretations in their chosen representation language • a matcher can never be sure to what the expressions in an repository actually refer (no God’s eye perspective), • if two ontologies are developed at different times, reality itself may have changed in the intervening period.
R And also most structures in reality are there in advance An example: merging data from two sourcesReality exist before any observation
B1 Some portions of reality escape his attention. R The author of O1 acknowledges the existence of some Portion Of Reality (POR)
B1 O1 R He considers only some of them relevant for O1,represents thus only part, here with Int = R+. RU1B1 • Both RU1B1 and RU1O1 are representational units referring to #1; • RU1O1 is NOT a representation of RU1B1; • RU1O1 is created through concretization of RU1B1 in some medium. RU1O1 #1
B2 B1 O2 O1 R Similarly concerning the author of O2
B2 B1 Om O2 O1 R Creation of the mapping
Two (out of many other) possible configurations #1 was not considered to be relevant for O2, but is considered to be relevant for Om. The author of O1 made an encoding mistake, so that his ontology contains a reference to a non-intended referent, and this is copied into Om.
Typology of expressions included in and excluded from an ontology in light of relevance and relation to external reality
Typology of expressions included in and excluded from an ontology in light of relevance and relation to external reality Valid presence in the representation Valid absence in the representation
But sometimes you get lucky … Typology of expressions included in and excluded from an ontology in light of relevance and relation to external reality Unjustified presence in the representation Unjustified absence in the representation
B2 B1 Om O2 O1 R The original beliefs are usually not accessible
Om O2 O1 R The original beliefs are usually not accessible • But if the ontologies are well documented and representations intelligible, then many such beliefs can be inferred, and mistakes found.
Om O2 O1 R For concept-based systems, there is also no reality
Om O2 O1 But that what must hold if both ontologies are believed to be right, can be believed to mirror reality
The principle of forced backward belief A lot of information loss
A decision support tool for dealing with inconsistencies ? • O1: • Holds that penguins are birds, birds fly • O2: • Holds that penguins are birds, penguins don’t fly • The problem for Om: • Which source ontology to believe? • What might be the source of the inconsistency ? • O1 is right and penguins do fly • O1 is wrong and either penguins are not birds or not all birds fly • Both are right but the representational units ‘penguin’, ‘bird’ and ‘fly’ do not refer to the same entities in reality.
Possible evolutions through updates Example: a relevant entity ceases to exist, but the representation is not updated:
Updating is an active process • authors assume in good faith that • all included representational units are of the P+1 type, and • all they are aware of, but not included, of A+1 or A+2. • If they become aware of a mistake, they make a change under the assumption that their changes are also towards the P+1, A+1, or A+2 cases. • Thus at that time, they know of what type the previous entry must of have been under the belief what the current one is, and the reason for the change.
This leads to a calculus … • NOT: • to demonstrate how good an individual version of an ontology is, • But rather • to measure how much it improved (hopefully) as compared to its predecessors. • Principle: recursive belief revision
Backward belief revision over time Reality: a POR exists and is not relevant R P • At time t, an analyst correctly perceives the existence of some particular, but considers it relevant while it isn’t, and he makes an encoding error such that the representational unit does not refer. • There is thus a -2 error with respect to reality, but this remains, of course, unknown. Beliefs At t about t -2
Backward belief revision over time Reality: a POR exists and is not relevant R P • At t+1, he correct the encoding mistake, which forces him to believe that at t, the unit-reality configuration was of type P-4 rather than P+1. Beliefs At t about t -2 At t+1 about t+1 At t+1 about t
Backward belief revision over time Reality: a POR exists and is not relevant R P • Although he believes that the current situation is P+1, it is in reality P-6, where it was P-7 before. • The real error is now -1, while the perceived error with respect to t is also -1 Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1
Backward belief revision over time Reality: a POR exists and is not relevant R P • At t+2, he believes that the posited POR in fact does not exist Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1
Backward belief revision over time Reality: a POR exists and is not relevant R P Beliefs At t about t -2 At t+1 about t+1 At t+1 about t -1 -1 At t+2 about t+2 At t+2 about t+1 At t+2 about t -1 -3 -5