760 likes | 774 Vues
Computational Geometry. Piyush Kumar (Lecture 2: NN Search). Welcome to CIS5930. Our First Problem . Nearest neighbor searching Applications? Pattern Classification Graphics Data Compression Document retrieval Statistics Machine Learning …. Similarity Measure.
E N D
Computational Geometry Piyush Kumar (Lecture 2: NN Search) Welcome to CIS5930
Our First Problem • Nearest neighbor searching • Applications? • Pattern Classification • Graphics • Data Compression • Document retrieval • Statistics • Machine Learning • …
Similarity Measure • In terms of Euclidean distance (4,5) (2,3)
Similarity Measure Similar?
Similarity measure • Other similarity Measures
The dimension • Lets assume that our points are in one dimensional space. ( d = 1 ). We will generalize to higher dimension ( When d = some constant ).
Fixed radius near neighbor problem Question • Given a set of points S on the real line, preprocess them to answer the following question : • Find all the pair of points (p,q) such that distance of (p,q) < r . q Points in S
Nearest neighbor search Question • Given a set of points S on the real line, preprocess them to answer the following query : • Given a query point q, find the neighbor of q which is closest in S. q nn(q) Points in S
Answers? • Fixed NN Search • O( n2 ) ? • O(nlogn + k) ? • O(n + k) ? • NN Search • O( n ) ? • O( log n) ?
Answers? • NN Search • O( n ) ? Brute Force [ Trivial ] • O( log n) ? Binary Search Tree • Fixed NN Search • O( n2 ) ? Brute Force • O(nlogn + k) ? Sorting • O(n + k) ? Hashing?
NN Searching : Balanced binary tree O( log n ) q nn(q) Points in S
K-nearest neighbor search • Problem: Given a set of points P on the real line and a query point q, find the k-nearest neighbors of q in P. • O(nlogn) Trivial bruteforce • Do you see how? • Thought Problem: How do we do this in O(n) time? • (Hint: Median finding works in O(n) time).
Brute Force implementation • min = + • for( i = 0; i < n; ++i ) for( j = 0; j < n; ++j ) • (D = dist (pj, pi )) < r ) ? ( cout << (pj, pi ) ) : (); • float dist ( point p, point q ) • sum = 0; • for ( i = 0; i < d; ++i ) sum += (pi – qi)2 • return sqrt(sum) What can we speed up here?
Brute Force implementation • min = + • for( i = 0; i < n; ++i ) for( j = 0; j < n; ++j ) • (D = dist (pj, pi )) < r ) ? ( cout << (pj, pi ) ) : (); • float dist ( point p, point q ) • sum = 0; • for ( i = 0; i < d; ++i ) sum += (pi – qi)2 • return sqrt(sum) How do we speed this up?
Fixed NN Search: By Sorting • Once we sort the points on the real line, we can just go left and right to identify the pairs we need. • Each pair is visited at most twice, so the asymptotics do not change.
Total work done after sorting • ki denotes the pairs generated when visiting pi • With this approach, we need at least Ω(nlogn) (for sorting).
Fixed radius near neighbor searching • How do we avoid sorting? • How do we get a running time of O(n+k) ? • How do we use hashing to simplify our problem.
b= -2 b=0 { r 0 Solution using bucketing • Put points in infinite number of buckets (Assume an infinite array B) Interval b is [ br, (b+1)r ] x lies in b = floor (x/r)
Solution using bucketing • Only n buckets of B might get occupied at most. • How do we convert this infinite array into a finite one: Use hashing • In O(1) time, we can determine which bucket a point falls in. • In O(1) expected time, we can look the bucket up in the hash table • Total time for bucketing is expected O(n) • The total running time can be made O(n) with high probability using multiple hash functions ( essentially using more than one hash function and choosing one at run time to fool the adversary ).
The Algorithm • Store all the points in buckets of size r • In a hash table [ Total complexity = O(n) ] • For each point x • b = floor(x/r); Retrieve buckets b, b+1 • Output all pairs (x,y) such that y is either in bucket b or b+1 and x < y and ||xy|| < r x 0
Running Time • Let nb denote the number of points in bucket b of the input pointset P. • Define Note that there are nb2 pairs in bucket b alone that are within distance r of each other.
Observation • Since each pair gets counted twice :
Running Time • Depends on the number of distance computations D. Total Running Time = O(n+k)
Higher Dimensions • Send (x,y) (floor(x/r),floor(y/r)) • Apply hash with two arguments • Running time still O(n+k) • Running time increases exponentially with dimension
Introduction: Geometry Basics • Geometric Systems • Vector Space • Affine Geometry • Euclidean Geometry • AG + Inner Products = Euclidean Geometry
Vector Space • Scalar ( + , * ) = Number Types • Usual example is Real Numbers R. • Let V be a set with two operations • + : V x V V • * : F x V V • Here F is the set of Scalars
Vector Space • If (V , +, * ) follows the following properties, its called a vector space : • (A1) u + (v + w) = (u + v) + w for all u,v,w in V. • (A2) u + v = v + u for all u,v in V. • (A3) there is unique 0 in V such that 0 + u = u for all u in V. • (A4) for every u in V, there is unique -u in V such that u + -u = 0. • (S1) r(su) = (rs)u for every r,s in R and every u in V. • (S2) (r +s)u = ru + su for every r,s in R and every u in V. • (S3) r(u + v) = ru + rv for every r in R and every u,v in V. • (S4) 1u = u for every u in V. Note: Vectors are closed under linear combinations A basis is a set of n linearly independent vectors that span V.
Affine Geometry • Geometry of vectors • Not involving any notion of length or angle. • Consists of • A set of scalars • Say Real numbers • A set of points • Position specification • A set of free vectors. • Direction specification
Affine Geometry • Legal operations • Point - Point = Vector • Point +/- Vector =Point • Vector +/- Vector = Vector • Scalar * Vector = Vector • Affine Combination • ∑Scalar * Points = Point • such that ∑Scalar = 1 • Note that scalars can range from –Infinity to +Infinity
Affine Geometry Affine Combination Convex Combination
Affine Combinations • [ Affine Span or Affine Closure ] The set of all affine combinations of three points generates a plane. • [ Convex Closure ] The set of all convex combinations of three points generates all points inside a triangle.
Euclidean Geometry • One more element added • Inner Products • Maps two vectors into a scalar • A way to `multiply’ two vectors
Example of Inner Products • Example : Dot products • (u.v) = u0v0+ u1v1+…+ ud-1vd-1 • Where u,v are d-dimensional vectors. • u = (u0,u1,…, ud-1); v = (v0,v1,…, vd-1) • Length of |u| = sqrt(u.u) (Distance from origin) • Normalization to unit length : u/|u| • Distance between points |p - q| • Angle (u’,v’) = cos-1(u’.v’) • where u’=u/|u| and v’=v/|v|
Dot products • (u.v) = (+/-)|u|(projection of v on u). • u is perpendicular to v (u,v) = 0 • u.(v+w) = u.v + u.w • If u.u not equal to zero then u.u > 0 • positive definite
Some proofs using Dot Products • Cauchy Schwarz Inequality • (u.v) <= |u||v| • Homework. • Hint: For any real number x • (u+xv).(u+xv) >= 0 • Triangle Inequality • |u+v|<=|u|+|v| • Hint: expand |u+v|2 and use Cauchy Schwarz. v u
Next Lecture • Orientation and Convex hulls Due Wed Homework: (Programming) Play with dpoint.hpp and example.hpp (Implement orientation in 2D). Implement your own dvector.hpp, dsegment.hpp using metaprogramming Make sure you understand how things work. Due Monday Homework: (Theory) Cauchy Schwarz Triangle Inequality dot n cross prod. Reading Assignment: Page 1-10, Notes of Dr. Mount Page 1-2 and Section 1.3 of the text book. Sources for this lecture: Dr. David Mount’s Notes. WWW.
Crash course on C++“dpoint.hpp” • Namespaces • Solve the problem of classes/variables with same name • namespace _cg {code} • Outside the namespace use _cg::dpoint • Code within _cg should refer to code outside _cg explicitly. E.g. std::cout instead of cout.
Object Oriented Programming • Identify functional units in your design • Write classes to implement these functional units • Separate functionality for code-reuse.
Class membership • Public • Private • Always : Keep member variables private • This ensures that the class knows when the variable changes • Protected
Inheritance • ‘is a’ relationship is public inheritance • Class SuperDuperBoss : public Boss • Polymorphism : Refer an object thru a reference or pointer of the type of a parent class of the object • SuperDuperBoss JB; • Boss *b = &JB; • Virtual functions
Templates • Are C macros on Steroids • Give you the power to parametrize • Compile time computation • Performance “The art of programming programsthat read, transform, or write other programs.” - François-René Rideau
Generic Programming • How do we implement a linked list with a general type inside? • void pointers? • Using macros? • Using Inheritance?
Templates • Function Templates • Class Templates • Template templates * • Full Template specialization • Partial template specialization
Metaprogramming • Programs that manipulate other programs or themselves • Can be executed at compile time or runtime. • Template metaprograms are executed at compile time.
Good old C • C code • Double square(double x) { return x*x; } • sqare(3.14) Computed at compile time • #define square(x) ((x)*(x)) • Static double sqrarg; #define SQR(a) (sqrarg=(a), sqrarg*sqrarg)
Templates • Help us to write code without being tied to particular type. • Question: How do you swap two elements of any type? How do you return the square of any type?
Function Templates • C++ • template< typename T > inline T square ( T x ) { return x*x; } • A specialization is instantiated if needed : • square<double>(3.14) • Template arguments maybe deduced from the function arguments square(3.14) • MyType m; … ; square(m); expands to square<MyType>(m) Operator * must be overloaded for MyType