Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects

Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects Hans-Peter Kriegel, Stefan Brecheisen, Peer Kröger, Martin Pfeifle, Matthias Schubert Database Group ACM SIGMOD 2003 San Diego, California June 9-12, 2003 Institute for Computer Science University of Munich, Germany

Introduction Introduction Space Partitioning Models Space Partitioning Models Evaluation Conclusion Data Partitioning Models Data Partitioning Models new Vector Set Model Evaluation Conclusion new Vector Set Model Outline of the Talk Introduction

similarity query timeout similarity query CAD-DB meaningful results in comparatevily short time similarity query unapt results complex Similarity Model based on Sets of Feature Vectors } Introduction spatial objects System Requirements: • System should help to reduce the cost of developing new parts • Avoidance of „reinventing the wheel“ • Reusing existing parts Solution: • Effective Similarity Search • Efficient Similarity Search

Space Partitioning Models Data Partitioning Models Evaluation Conclusion Outline of the Talk Introduction Introduction Space Partitioning Models

Space Partitioning Models 0.75 0.34 . . . triangle meshes normalized, voxelized object feature vector • 3D CAD object is represented by a mesh of triangles • Voxelization of triangle meshes and object normalization • Partitioning of the data space into disjoint, enumerated cells Feature Transformation CAD system • Extraction of k spatial features for each cell • Similarity of objects = vicinity of according feature vectors

Space Partitioning Models CAD object Notation r = 9 p = 3 representing V o [2D example] • The data space is partitioned into p axis-parallel grid cells in each dimension • Let r = the raster (voxel) resolution • V o = set of voxels representing object o O • Vio = set of voxels covered by o in cell i • fo(i) = i-th value of the feature vector of o

Space Partitioning Models 6 9 3 6 6 0 • Count the number of object voxels Vio in each cell i • Normalize by the voxel capacity of each cell K • Feature value for cell i: fo(i) = where K = in the 3D case r o 3 V ( ) i p K The Volume Model 4 6 6 1/9 [2D example]

Space Partitioning Models 0 1 0.34 0.30 0.31 0.32 • The solid angle model measures the concavity and convexity of surfaces • Compute the SA-value SA(v) for each surface-voxel v of object o: SA(v)= , where is a voxelized reference sphere around v |SvVo| |Sv| Sv • fo(i) =SA(v) 1 m å m j=1 The Solid Angle Model Sy y x Sx [2D example] • fo(i) = 0 if cell i contains no voxel of o • Each cell is represented by one dimension in the feature vector • fo(i) = 1 if cell i contains only inside voxel of o

Evaluation Conclusion Outline of the Talk Introduction Space Partitioning Models Space Partitioning Models Data Partitioning Models Data Partitioning Models

Data Partitioning Models Cover-Sequence: Error: 2D feature vector fo: 1 1 6 7 S1=(C0+C1) Err1=14 S2=((C0+C1)+ C2) Err2=10 fo4·i+1 = x-position of Ci fo4·i+2 = y-position of Ci fo4·i+3 = x-extension of Ci fo4·i+4 = y-extension of Ci 7 1 2 3 S3=((C0+C1)+ C2)-C3 )Err3=7 6 5 1 3 Cover Sequence Model [2D example] • Approximation of the object by means of a cover sequence (Jagadish 91) • Cover sequence: Sk = (((C0 1C1 ) 2C2 ) … kCk), where i {+, -}, k thenumber of covers, and Ci axis-parallel (hyper-) rectangles • Approximation quality: symmetric volume difference Errk=|o XOR Sk| • Computation of Sk by means of a greedy algorithm • The object is represented by a6·k dimensional feature vector (3D case)

Data Partitioning Models S4query (original) = ((((C0 + C1) – C2) – C3) – C4) S4query (optimal) = ((((C0 + C1) – C3) – C4) – C2) query object database object S4database S4database q1px q1py q1ex q1ey q2px q2py q2ex q2ex q3px q3py q3ex q3ey q4px q4py q4ex q4ey db1px db1py db1ex db1ey db2px db2py db2ex db2ex db3px db3py db3ex db3ey db4px db4py db4ex db4ey q1px q1py q1ex q1ey q3px q3py q3ex q3ey q4px q4py q4ex q4ey q2px q2py q2ex q2ex db1px db1py db1ex db1ey db2px db2py db2ex db2ex db3px db3py db3ex db3ey db4px db4py db4ex db4ey ) ) deuclid( deuclid( , , Vector Set Model >> >>

Data Partitioning Models db3px db3py db3ex db3ey db2px db2py db2ex db2ex db1px db1py db1ex db1ey q4px q4py q4ex q4ey q3px q3py q3ex q3ey q2px q2py q2ex q2ex db4px db4py db4ex db4ey q1px q1py q1ex q1ey extension X extension X extension Y extension Y position Y position Y position X position X weight function for unmatched nodes= distance to a dummy cover • weight of each edge (x, y)  XY is deuclid(x,y) Vector Set Model query object database object q1px q1py q1ex q1ey q2px q2py q2ex q2ex q3px q3py q3ex q3ey q4px q4py q4ex q4ey db1px db1py db1ex db1ey db2px db2py db2ex db2ex db3px db3py db3ex db3ey db4px db4py db4ex db4ey [2D example] • the cover sequence Sk = (((C0 1C1 ) 2C2 ) … kCk) is represented by a set of vectorsX6, | X| k (in the 3D case) • distance measure between two vector sets X and Y: • the minimum weight • perfect matching • create a complete bipartite graph G = (XY, XY) • computed by the Kuhn Munkres algorithm in O(k3) • weight function for unmatched nodes if |X| |Y|

Data Partitioning Models query object extension X Feature Distance of Centroides database object query centroid database centroid extension Y Minimum Weight Perfect Matching Distance Filter Step (index-based) position Y position X candidates Refinement Step (exact evaluation) results Vector Set Model • Efficient similarity queries based on multi-step query processing • range queries (Faloutsos et al. 94) • k-Nearest Neighbor Queries (Korn et al. 96) • optimal Multi-Step k-Nearest Neighbor Search(Seidl, Kriegel 98) • lower bounding property guarantees no false drops • o1, o2 O : do(o1, o2) df(o1, o2) • k (=cardinality of the two vector sets) times the distance between the centroides of the two vector sets, lower bounds the minimum weight perfect matching distance

Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Data Partitioning Models Evaluation Evaluation Conclusion

Evaluation volume model: volume model: distance: 0.0 0.0098 0.307 0.416 0.46 distance: 0,0 0,0 0,0178 0,0176 0,022 solid angle model: solid angle model: distance: 0.0 0.0 0.368 0.368 0.666 distance: 0,0 0,04 0,04 0,07 0,12 K-nn Queries „bad“ similarity model? „good“ similarity model? • Evaluation of similarity models by means of k-nn queries • report the k objects having the smallest distance to a query object q • Problem: •evaluation using k-nn queries is subjective • quality measure of a model depends on the choice of the query objects

Evaluation A1 A A2 A B 1 A1 A2 B 2 B B Hierarchical Clustering Data Space Reachability Plot • Hierarchical Clustering: • More objective since each object of the database is taken into account to measure the quality of a similarity model • OPTICS (Kriegel et al. 99) • Yields a density-based hierarchical clustering • Insensitive to input parameters • Result (so called reachability plot) can be easily visualized • and is suitable for interactive exploration

Evaluation C B A Space Partitioning Similarity Models Car Dataset app. 200 parts, r=30, p=3 Volume Model no classes found Solid Angle Model Class A Class B Class C

Evaluation G Class X E C X A Class E Class G Class A1 G A G2 Class A2 D C F B G1 E A1 A2 Class E Class F Class G1 Class G2 Data Partitioning Similarity Models Car Dataset app. 200 parts, r=15, 7 covers Cover Sequence Model Vector Set Model

Evaluation Efficiency of the Vector Set Model • Efficiency Evaluation: • 100 10-nn-queries on the plane database, cover sequence with 7 covers • vector set model without filter <-> vector set model with filter • Filter step leads to a speed up factor of approximately 2 • Filter step has a selectivity of approximately 20% • vector set model <-> cover sequence model • vector set model outperforms cover sequence model

Outline of the Talk Introduction Space Partitioning Models Data Partitioning Models Evaluation Evaluation Conclusion Conclusion

Conclusion q1px q1py q1ex q1ey q2px q2py q2ex q2ex q3px q3py q3ex q3ey q4px q4py q4ex q4ey db1px db1py db1ex db1ey db2px db2py db2ex db2ex db3px db3py db3ex db3ey db4px db4py db4ex db4ey q1px q1py q1ex q1ey q2px q2py q2ex q2ex q3px q3py q3ex q3ey q4px q4py q4ex q4ey db1px db1py db1ex db1ey db2px db2py db2ex db2ex db3px db3py db3ex db3ey db4px db4py db4ex db4ey extension X extension Y position Y position X • Contribution: • Sets of feature vectors : a new way of representing objects in similarity search somewhere between feature vectors and graphs • Effective and efficient similarity model for CAD data based on sets of feature vectors • Evaluation of similarity models based on hierarchical clustering

Conclusion • Future Work: • BOSS (Browsing OPTICS-Plots for Similarity Search) • Interactive data browsing tool based on reachability plots • User-friendly method to support the time-consuming task of finding similar parts: • Revealing the hierarchical clustering structure • of the dataset at a glance • Displaying suitable representatives for large clusters

Thank you for your attention ? ? ? ? ? Any questions? ? ? ?

Using Sets of Feature Vectors for Similarity Search on Voxelized CAD Objects