G eneral i zed S earch T rees

Generalized Search Trees J.M Hellerstein, J.F. Naughton and A. Pfeffer, “Generalized Search Trees for Database Systems,” Proc. 21st Int’l Conf. On VLDB, Sep. 1995 Presented ByIhab Ilyas

Topics • Motivation. • Database Search Trees. • Generalized Search Tree. • Properties. • Methods. • Applications.

New Data types Extending search trees to maximum flexibility Motivation • New applications (Multimedia, CAD tools, document libraries…etc.)

Before GiST • Specialized Search Trees • Example: Spatial Search Trees ( R-Trees) • Problem: New Applications implies new tree structure from scratch • Search Trees For Extensible Data Types • Example: Extending B+ to index any ordinal data • Problem: Extending data but not the set of queries supported.

GiST • A third direction for extending search trees • Extensible both in data types supported and in the queries applied on this data. • Allows new data types to be indexed in a manner that supports the queries natural to the data type.

GiST (Cont.) • Unifies previously disparate structures for currently common data types. • Examples: B+ and R trees can be implemented as extensions to GiST. Single code base for indexing multiple dissimilar applications.

Key1 Key2 …. Database Search Trees • Canonical rough picture of database search tree Internal Nodes Leaf nodes (Linked List)

Search Trees (cont.) • Search Key: A search key may be arbitrary predicate that holds for each datum below the key. • Search Tree: A hierarchy of categorizations, in which each categorization holds for all data stored under it in the hierarchy.

Generalized Search Tree Definition: A GiST is a balancedmulti-way tree of variable fan-out between kM and M Where k is the fill factor. With the exception of the root node that can have fan-out from 2 to M.

GiST (Cont.) • Leaf nodes: (p,ptr) • p: Predicate used as a search key. • ptr: the identifier of some tuple of the database. • Non-leaf nodes: (p,ptr) • p: Predicate used as a search key. • ptr: Pointer to another tree node.

Properties • Every node contain between kM and M unless it is the root. • For each index entry (p,ptr) in a leaf node, p holds for the tuple • For each index entry (p,ptr) in a non-leaf node, p is true when instantiated with the values of any tuple reachable from ptr. • All leaves appear on the same level.

The ability of orthogonal classification.. Recall R-Tree Note on Properties p holds for p1,p2 …. (p,ptr) ….. p’ holds for p1,p2 p’  p Not Required …. (p’,ptr’) ….. …. (p1,ptr1) ….. …. (p2,ptr2)

GiST Methods • Key Methods: the methods the user can specify to configure the GiST. The methods encapsulate the structure and behavior of the object class used for keys in the tree. • Tree Methods: Provided by the GiST, and may invoke the required key methods.

Key Methods E is an entry of the form (p,ptr) , q is a query, P a set of entries • Consistent(E,q): False if p^q guaranteed unsatisfiable, true otherwise. • Union(P): returns predicate r that holds for all predicates in P • Compress(E): returns (p’,ptr). • Decompress(E): returns (r,ptr) where pr. This a lossy compression as we do not require p r

Key Methods (Cont.) • Penalty(E1,E2): returns domain specific penalty for inserting E2 into the subtree rooted at E1. Typically the penalty metric is representation of the increase of size from E1.p to Union(E1,E2). • PickSplit(P): M+1 entries, splits P into two sets of entries P1,P2, each of the size kM. The choice of the minimum fill factor is controlled here.

Tree Methods • Search: Controlled by the Consistent Method. • Insert: Controlled by the Penalty and PickSplit. • Delete: Controlled by the Consistent

R (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) New (q,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (p,ptr) (q,ptr) (p,ptr) Example New (q,ptr) Penalty = m Penalty = n m < n Penalty =i Penalty = j j < i Full.. Then split according to PickSplit

Applications • GiST Over Z (B+ Trees) • GiST Over Polygons in R2 (R Trees)

B+ Trees Using GiST p here is on the form Contains([xp,yp),v) • Consistent(E,q) returns true if • If q= Contains([xq,yq),v): (xp<yq)^(yp>xq) • If q= Equal (xq,v): xp xq <yp • Union(P) returns [Min(x1,x2,…,xn),MAX(y1,y2,….,yn)).

B+ Trees Using GiST (Cont.) • Penalty(E,F) • If E is the leftmost pointer on its node, returns MAX(y2-y1,0) • If E is the rightmost pointer on its node, returns MAX(x1-x2,0) • Otherwise, returns MAX(y2-y1,0)+MAX(x1-x2,0) • PickSplit(P) let the first entries in order to go to the left node and the remaining in the right node.

B+ Trees Using GiST (Cont.) • Compress(E) if E is the leftmost key on a non-leaf node return 0 bytes otherwise, returns E.p.x • Decompress(E) • if E is the leftmost key on a non-leaf node let x= - otherwise let x=E.p.x • If E is the rightmost key on a non-leaf node let y= . If E is other entry in a non-leaf node, let y = the value stored in the next key. Otherwise, let y = x+1

R - Trees Using GiST The key here is in the form (xul,yul,xlr,ylr) • Query predicates are: • Contains ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1xul2) ^(yul1yul2) ^ (xlr1xlr2) ^ (ylr1ylr2) • Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1xlr2) ^(yul1ylr2) ^ (xul2xlr1) ^ (ylr1yul2) • Equal ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) Returns true if (xul1=xul2) ^(yul1=yul2) ^ (xlr1=xlr2) ^ (ylr1=ylr2)

R – Trees Using GiST(Cont.) • Consistent(E,q) • p contains (xul1,yul1,xlr1,ylr1), and q is either Contains, Overlap or Equal (xul2,yul2,xlr2,ylr2) • Returns true if Overlaps ((xul1,yul1,xlr1,ylr1), (xul2,yul2,xlr2,ylr2)) • Union(P) returns coordinates of the maximum bounding rectangles of all rectangles in P.

R – Trees Using GiST (Cont.) • Penalty(E,F) • Compute q= Union(E,F) and return area(q) – area(E.p) • PickSplit(P) • Variety of algorithms are provided to best split the entries in a over-full node.

R – Trees Using GiST (Cont.) • Compress(E) • Form the bounding rectangle of E.p • Decompress(E) • The identity function

G eneral i zed S earch T rees

G eneral i zed S earch T rees

Presentation Transcript

HAPSA G eneral Meeting

S T G

T he N EW G ENERAL E DUCATION PROGRAM

S earch

S CHEDULE AND G ENERAL I NFORMATION

s w i n g i t.

T A S T I N G N O T E S

G eneral i zed S earch T rees

G eneral P roblem S olver

T h e T i g e r R i s i n g

S EARCH T OOLS

S T A I N I N G

G I S T

T S T I N G

G H Rees, RAL

V e s t i g i o

I ntroduction to G eneral A nthropolgy

S T R I N G

T rees

G eneral K nowledge

I ntroduction to G eneral A nthropolgy

I nternational G eneral C ertificate of S econdary E ducation