1 / 19

Native Multidimensional Indexing in Relational Databases

Native Multidimensional Indexing in Relational Databases. David Hoksza , Tom áš Skopal Charles University in Prague Department of Software Engineering Czech Republic. Presentation Outline. Multidimensional querying attribute number growth indexing methods contemporary DB systems

rhys
Télécharger la présentation

Native Multidimensional Indexing in Relational Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Native Multidimensional Indexingin Relational Databases David Hoksza, Tomáš Skopal Charles University in PragueDepartment of Software Engineering Czech Republic

  2. Presentation Outline • Multidimensional querying • attribute number growth • indexing methods • contemporary DB systems • Indexing in PostgreSQL • user-defined access methods • external indexing • framework • Experiments COMAD 2008

  3. Indexing • Single-attribute based indexing • to avoid sequential scan • B-tree • Multi-attribute based indexing • window query • straightforward solution – multiple B-trees • SELECT * FROM Products • WHERE • BrandID BETWEEN (3 AND 11) • SELECT * FROM Products • WHERE • BrandId BETWEEN (13 AND 14) • AND • ProductTypeID BETWEEN (13 AND 24) • AND • PeriodID BETWEEN (3 AND 11) COMAD 2008

  4. Multi-attribute Based Indexing • Multiple B-trees • attribute number growth • → exponential growth of partial result-sets • → sequential scan • dimensionality curse COMAD 2008

  5. B+-tree with Compound Keys most often employed solution multiple keys – single chained value key components compared in lexicograhpical order assymetry in the order of the keys UB-tree transformation of n-dim points into 1-dim Z-address→ Z-curve Z-curves divided into Z-regions indexed by a B+-tree Multi-dimensional access methods R-tree COMAD 2008

  6. Native Multi-dimensional Indexing • SELECT * FROM Products • WHERE • BrandId BETWEEN (13 AND 14) • AND • ProductTypeID BETWEEN (13 AND 24) • AND • PeriodID BETWEEN (3 AND 11) • Table rows • points in n-dimensional space • R-tree • Queries • cubes in n-dimensional space (n-dimensionalwindows) COMAD 2008

  7. Contemporary DB systems COMAD 2008

  8. PostgreSQL • Object-relational DBMS • Open-source • Since 2005 (v. 8.0) runs on Windows • Emphasis on extensibility • data types • operators • procedural languages • access methods • … COMAD 2008

  9. Relation Types in PostgreSQL • Heap relations (HRs) • user relations • system catalog • undefined order • Index relations (IRs) • <key,value> pairs • external • fast access to heap relations • internal • access methods’ structures COMAD 2008

  10. User-defined Access Methods (AM) in PostgreSQL • Implement a set of functions communicating with PostgreSQL’s core (AM implementation). • Register the functions (located in libraries). • Define an AM (index type) by connecting the functions with a newly created AM. • Establish a class of operators and types for the AM. • Use the index. COMAD 2008

  11. required functions index_build creating a structure index_insert inserts a record index_beginscan starts a new scan index_gettuple gets a record fulfilling search conditions index_getmulti gets a set of records index_endscan finishes a search index_markpos marks actual position in a scan index_restrpos returns to a marked position index_rescan repeats scan with the same structure of search keys index_bulkdelete removes a set of records index_costestimate estimates cost of a search User-defined Access Methods in PostgreSQL – cont. COMAD 2008

  12. External Indexing • AM in PostgreSQL store data in IRs returning IRs’ TIDs to the core • External indexing framework • Storing IRs’ TIDs in external index storage • PostgreSQL’s interface to access methods still requires high level of mastering of PostgreSQL’s inner mechanisms • Framework for external indexing COMAD 2008

  13. Framework interface • void FW_CreateStructure(Relation index_relation); • void*FW_PrepareInsert(Relation index_relation); • void FW_InsertTuple(void *fw_data, Relation index_relation, IndexTuple index_tuple, BlockNumberblock_number,OffsetNumber offset); • void FW_FinishInsert(void *fw_data); • void FW_InitSearch(IndexScanDesc scan, ScanDirectiondir); • bool FW_GetNextTID(IndexScanDesc scan, ScanDirectiondir, BlockNumber *block_number, OffsetNumber *offset); • void FW_DeleteTuple(BlockNumber block number,OffsetNumber offset); COMAD 2008

  14. Experimental evaluation • The testbed • Uniform • clusters of uniformly distributed (up to 15-dimensional) objects • Gauss • (up to 3-dimensional) objects following Gaussian distribution • DBLP • 435,373 DBLP database records • author, type of publication, year of publication, number of pages • Studied costs • index access count • real-time COMAD 2008

  15. Experiments – Database Growth COMAD 2008

  16. Experiments – Query Selectivity COMAD 2008

  17. Experiments – Dimension Growth COMAD 2008

  18. Experiments – Real-time COMAD 2008

  19. Conclusion • We have proposed and implemented • Native multidimensional indexing by R-tree • Indexing framework for PostgreSQL • Implementation of native external R-tree index • Results show • big speed-up on the real-world data according to index-access metric • poor physical implementation of access methods in PostgreSQL in comparison to Oracle and MSSQL Server COMAD 2008

More Related