250 likes | 364 Vues
Yuan Ren, Jeff Z. Pan and Kevin Lee University of Aberdeen, UK, NICTA, Australia, JIST2011. Parallel ABox Reasoning of EL Ontologies. Motivation. Computing infrastructure has significantly improved in the last decades. Computer networks cloud computing
E N D
Yuan Ren, Jeff Z. Pan and Kevin Lee University of Aberdeen, UK, NICTA, Australia, JIST2011 Parallel ABox Reasoning of EL Ontologies
Motivation • Computing infrastructure has significantly improved in the last decades. • Computer networks cloud computing • Integrated circuits multi-core processors • Computation can be, and have already been parallelised in my applications. • But most off-the-shelf DL reasoners do not support parallelised reasoning.
Existing Works • Parallel reasoning with multiple computational nodes • Marvin system: RDF reasoning; • Weaver and Handler’s work: RDFS inference; • SAOR: Join-free pD* inference; • DRAGO: OWL DL reasoning; • Anne and Heiner’s work: ALCHIQ reasoning; • MapReduce based approaches such as WebPIE: RDFS, pD*, EL TBox reasoning; pD* justification • Parallel reasoning with multiple computational core in a single computer • Soma and Prasanna’s work: pD* reasoning; • Liebig and Muller’s work: SHN tableau reasoning; • Meissner’s work: ALC tableau reasoning; • Aslani and Haarslev’s work: TBox reasoning • ELK algorithm by Kazakov: ELHR+TBox classification; SROIQ SHIN ALC intractable EL+ tractable pD* ELHR+ TBox only RDFS Parallel reasoning for large amount of data in the EL profile is missing RDF
Supporting ELHbottom, R+ ABox Reasoning • Why EL family? • Some of the well-known largest terminologies are in the EL family, • E.g. SNOMED CT; • Why ABox reasoning? • Semantic applications will populate terminologies with data • E.g. Chintan Patel et al (SWJ2007) populated SMOMED CT with 59 million ABox assertions. • Why the “bottom” is non-trivial? • Enables inconsistency checking • E.g. in SNOMED CT, Groin is defined as Abdomen AND Leg, which is inaccurate and can be detected if Abdomen and Leg are disjoint. • Role hierarchy can not be pre-computed
TBox Reasoning in ELHR+ • EL reasoning is realised by applying completion rules • Starting from the original axioms • Check which axioms can be joined to trigger rules • Increase the entailment set until closed under the rules
TBox Reasoning in ELHR+ • A naïve approach requires guarding shared data collections with locks … Inserting into and Retrieving from a set can not be performed at the same time ! Solution 1: Guarding with locks Solution 2: separating the inferences and data collections
Parallel TBox Reasoning in ELHR+ • Key data structures: • Axiom: a GCI • RI closures are pre-computed; • Context: a concept • Context.scheduled: a queue of Axioms to be processed; • Context.processed: a set of Axioms already processed; • Context.isActive = true IFF Scheduled is non-empty; • ActiveContexts: a queue of Context • Every element must be unique
Parallel TBox Reasoning in ELHR+ Worker 1 Worker 2 … scheduled processed
Parallel TBox Reasoning in ELHR+ Worker 1 Worker 2 … scheduled processed
Parallel TBox Reasoning in ELHR+ Worker 1 Worker 2 … scheduled processed
Parallel TBox Reasoning in ELHR+ Worker 2 Worker 1 … scheduled processed Reasoning is completely and independently separated into different contexts
Parallel TBox Reasoning with ELHR+ • Get contexts of axioms • Contexts are only need for premise axioms in rules • Not for side condition axioms! • Once a new axiom is derived • It must be added into the schedule of ALL of its contexts • And later be saved into the processed set of ALL of its contexts Optimising by reducing premises in rules Optimising by reducing contexts in axioms
Extending TBox with Bottom • The bottom rule: • It still has a common context for all premise axioms • Lock-free parallelisation guaranteed.
Extending to ABox Reasoning • ABox reasoning • Computing the atomic types of all individuals • Computing the atomic relations between all individuals • A simply approach by reusing the TBox algorithm • Internalising the ABox with nominals • Treating singleton nominals as atomic concepts
Mixing TBox and ABox Reasoning • Introducing redundancies • has to be maintained in A.scheduled and A.processed, waiting for the derivation of . Worker 1 Worker 2 scheduled … processed
Separating TBox and ABox Reasoning • C.scheduledC.processed • contains no nominal! • Can always be computed earlier than • Can be used as side conditions in rules. • C does not need to be a context in
Separating TBox and ABox Reasoning Applicable nominals NOT applicable nominals Extending to ABox rules When the filler is NOT a nominal When the filler is a nominal
ABox Rules • First perform TBox reasoning • Only non-nominals are used as contexts • Perform ABox reasoning with ABox rules • Only singleton nominals are used as contexts • Sound & complete (Theorem 1)
Separating Relations and Types • {a}.scheduled{a}.processed • won’t affect relations! • Can always be computed later than • can be used as side conditions in rules. • {a} does not need to be a context in in type stage.
Separating Relations and Types • Relation computations are perfectly parallelised • R(a,b) S(a,b) with RIs as side conditions; • R(a,b) R(c,b) with R(c,a) in ABox and trans(R) as side conditions; • R(a,b) R(c,b) with a=c in ABox as side condition; • b=a R(c,b) with R(c,a) in ABox as side condition; • Relation computations can be performed in parallel with TBox classification
Evaluation • Benchmark • VICODI ontology • NotGalenTBox + synthetic ABox generated by SyGENiA • Environment • AWS EC2 cloud computing, 64-bit Linux, 7G RAM, each worker ≈ 2.5-3.0 GHz Off-the-shelf Reasoners PEL
Evaluation • Scalability Evaluation • NotGalenTBox + synthetic ABox generated by SyGENiA • AWS EC2 cloud computing, 64-bit Linux, 70G RAM, each worker ≈ 3.5-4.2 GHz
Summary • Parallel ABox reasoning can handle 1 million individuals and 9 million ABox assertions • Optimising the orders of different inferences can reduce redundancies. • Parallel ABox reasoning can still be improved • Do not scale linearly • Frequent RAM I/O vs. limited bandwidth can be a potential cause. • Distributed reasoning as a future work • Language is still not expressive enough • Role chains and nominals in TBox are hard to parallelise • Extension to other language as a future work • Full materialisation is too memory-consuming • Target-oriented QA algorithm and optimisation as a future work
Thank You! • Q & A