190 likes | 309 Vues
Optimizing Similarity Computations for Ontology Matching - Experiences from GOMMA. Michael Hartung , Lars Kolb, Anika Groß , Erhard Rahm Database Research Group University of Leipzig. getSim (str1,str2 ).
 
                
                E N D
Optimizing Similarity Computations for Ontology Matching - Experiences from GOMMA Michael Hartung, Lars Kolb, AnikaGroß, Erhard Rahm Database Research Group University of Leipzig getSim(str1,str2) 9th Intl. Conf. on Data Integrationin the Life Sciences (DILS)Montreal, July 2013
Ontologies • Multiple interrelated ontologies in a domain • Example: anatomy • Identify overlapping information between ontologies • Information exchange, data integration purposes, reuse • …  Need to create mappings between ontologies Mouse Anatomy SNOMED NCI Thesaurus UMLS MeSH GALEN FMA
Matching Example • Two ‘small’ anatomy ontologies O and O’ • Concepts with attributes (name, synonym) • Possible match strategy in GOMMA* • Compare name/synonym values of concepts by a string similarity function, e.g., n-gram or edit distance • Two concepts match if one value pair is highly similar 5x4=20 similarity computations MO,O’ = {(c0,c0’),(c1,c1’),(c2,c2’)} * Kirsten, Groß, Hartung, Rahm: GOMMA: A Component-based Infrastructure for managing and analyzing Life Science Ontologies and their Evolution. Journal Biomedical Semantics, 2011
Problems • Evaluation of Cartesian product OxO’ • Especially for large life science ontologies • Different strategies: pruning, blocking, mapping reuse, … • Excessive usage of similarity functions • Applied O(|O||O’|) times during matching • How efficient (runtime, space) does a similarity function work? • Experiences from GOMMA • Optimized implementation of n-gram similarity function • Application on massively parallel hardware • Graphical Processing Units (GPU) • Multi-core CPUs getSim(str1,str2)
Trigram (n=3) Similarity with Dice • Trigram similarity • Input: two strings A and B to be compared • Output: similarity sim(A,B) ∈ [0,1] between A and B • Approach • Split A and B into tokens of length 3 • Compute intersect (overlap) between both token sets • Calculate dice metric based on the size of intersect and token sets • (Optional) Assign pre-/postfixes of length 2 (e.g., ##, $$) to A and B before tokenization
Trigram Similarity - Example sim(‘TRUNK’, ‘TRUNCUS’) • Token sets • {TRU, RUN, UNK} • {TRU, RUN, UNC, NCU, CUS} • Intersect • {TRU, RUN} • Dice metric • 22 / (3+5) = 4/8 = 0.5
Naïve Solution • Method • Tokenization (trivial) • Result: two token arrays aTokens and bTokens • Intersect computation with Nested Loop (main part) • For each token in aTokens look for same token in bTokens true: increase overlap counter (and go on with next token in aTokens) • Final similarity (trivial) • 2overlap / (|aTokens|+|bTokens|) 0 1 2 overlap: • aTokens: {TRU, RUN, UNK} 8 • bTokens: {TRU, RUN, UNC, NCU, CUS} #string-comparisons:
“Sort-Merge”-like Solution • Optimization ideas • Avoid string comparisons • String comparisons are expensive especially for equal-length strings (e.g., “equals” of String class in Java) • Dictionary-based transformation of tokens into unique integer values • Avoid nested loop complexity • O(mn) comparisons to determine intersect of token sets • A-priori sorting of token arrays  make use of ordered tokens during comparison (O(m+n), see Sort-Merge join ) • Amortization of sorting  token sets are used multiple times for comparison ‚UNC‘ = ‚UNK‘?  3 = 8? 
“Sort-Merge”-like Solution - Example sim(TRUNK,TRUNCUS) • Tokenization  integer conversion  sorting • TRUNK  {TRU, RUN, UNK}  {1, 2, 3} • TRUNCUS  {TRU, RUN, UNC, NCU, CUS}  {1, 2, 4, 5, 6} • Intersect with interleaved linear scans 0 1 2 overlap: • aTokens: {1, 2, 3} 3 • bTokens: {1, 2, 4, 5, 6} #integer-comparisons:
GPU as Execution Environment • Design goals • Scalability with 100’s of cores and 1000’s of threads • Focus on parallel algorithms • Example: CUDA programming model • CUDA Kernels and Threads • Kernel: function that runs on a device (GPU, CPU) • Many CUDA threads can execute each kernel • CUDA vs. CPU threads • CUDA threads extremely lightweight (little creation overhead, instant switching, 1000’s of threads used to achieve efficiency) • Multi-core CPUs can only consume a few threads • Drawbacks • A-priori memory allocation, basic data structures
Bringing n-gram to GPU • Problems to be solved • Which data structures are possible for input / output? • How to cope with fixed / limited memory? • How can n-gram be parallelized on GPU?
Input- /Output Data Structure • Three-layered index structure for each ontology • ci: concept index representing all concepts • ai: attribute index representing all attributes • gi: gram (token) index containing all tokens • Two arrays to represent top-k matches per concept • A-priori memory allocation possible (array size of k|O|) • Short (2 bytes) instead of float (4 bytes) data type for similarities  reduce memory consumption
Input- /Output Data Structure - Example [1,2] [1,2] [6,7,8] [3,4,5] [6,7,8] [3,4,18,19,20] [9,10] [9,10,21] [11,12,13,14,15,16,17] O: O‘: c0 c1 c2 Input: gi 3 5 0 1 ci ai ai 0 2 5 8 10 17 gi ci 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 c0‘ c1‘ c2‘ Output: top-k=2 ˄ sim>0.7 sims corrs MO,O‘: c0-c0‘ c1-c1‘ c2-c2‘
Limited Memory / Parallelization • Ontologies and mapping to large for GPU • Size-based ontology partitioning* • Ideal case: one ontology fits completely in GPU memory • Each kernel computes n-gram similarities between one concept of Pi and all concepts in Qj • Re-use of already stored partition in GPU • Hybrid execution on GPU/CPU possible -Q4 -1-Q4 GPUusage Match task GPU Kernel0 MP0,Q3 ReplaceQ3 with Q4 corrs GPU thread … sims ReadMP0,Q4 Kernel|P0|-1 Q4 P0 Q3 P0 CPUusage Global memory ci ci CPU thread(s) ci ci ai ai ai ai gi gi gi gi * Groß, Hartung, Kirsten, Rahm: On Matching Large Life Science Ontologies in Parallel. Proc. DILS, 2010
Evaluation • FMA-NCI match problem from OAEI 2012 Large Bio Task • Three sub tasks (small, large, whole) • With blocking step to be comparable with OAEI 2012 results* • Hardware • CPU: Intel i5-2500 (4x3.30GHz, 8GB) • GPU: Asus GTX660 (960 CUDA cores, 2GB) * Groß, Hartung, Kirsten, Rahm: GOMMA Results for OAEI 2012. Proc. 7th Intl. Ontology Matching Workshop, 2012
Results for one CPU or GPU • Different implementations for Trigram • Naïve nested loop, hash set lookup, sort-merge • Sort-merge solution performs significantly better • GPU further reduces execution times (~20% of CPU time)
Results for hybrid CPU/GPU usage • NoGPU vs. GPU • Increasing number of CPU threads • Good scalability for multiple CPU threads (speed up of 3.6) • Slightly better execution time with hybrid CPU/GPU • One thread required to communicate with GPU
Summary and Future Work • Experiences from optimizing GOMMA’s ontology matching workflows • Tuning of n-gram similarity function • Preprocessing (integer conversion, sorting) for more efficient similarity computation • Execution on GPU • Overcoming GPU drawbacks (fixed memory, a-priori allocation) • Significant reduction of execution times • 104min99sec for FMA-NCI (whole) match task • Future Work • Execution of other similarity functions on GPU • Flexible ontology matching / entity resolution workflows • Choice between CPU, GPU, cluster, cloud infrastructure, …
Thank You for Your Attention !! Questions ?