1 / 15

An evaluation of Data Storage and Analysis With Oracle DBMS

An evaluation of Data Storage and Analysis With Oracle DBMS. Eric Grancher, CERN IT/DB Database Workshop, CERN July 11 th -13 th 2001. TAG data analysis. First step of the evaluation of the possibility to use advanced index features to perform simple cuts

wenda
Télécharger la présentation

An evaluation of Data Storage and Analysis With Oracle DBMS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An evaluation of Data Storage and Analysis With Oracle DBMS Eric Grancher, CERN IT/DB Database Workshop, CERN July 11th-13th 2001

  2. TAG data analysis • First step of the evaluation of the possibility to use advanced index features to perform simple cuts • Most of the credit goes to Maciej Marczukajtis (technical student) • TAG data imported from a ALEPH PAW file

  3. Model • Data model has 16 double attributes • Cut: “retrieve entries for which field1 > 12 and field2 < 23 and field3 > 56” • The model can be implemented as a relational table, mapping every attribute to a column

  4. Performing the Cuts • Normal B*tree index can’t be used in an efficient way as they can’t be merged • Bitmap indexes can be merged • “how to use the bitmap indexes to perform cuts?”

  5. Size = f(cardinality) 1 million rows 5 million rows size size cardinality cardinality

  6. 0 10 20 30 40 Binning 1 2 3 4 0 • Double values, problem with cardinality • Binning idea (c.f. Kurt Stockinger) • Purpose - reducing the cardinality. Binning function maps the whole range of values in column into bins (sub ranges) • Function used in tests [lower, upper]-> [0…number of bins]

  7. Function based indexes • create bitmap index f_x1 on tag_data(binning(field1)) • Queries have to exactly match the index creation statements

  8. Lower limit Upper limit Max Min 0 Selected by the second query Selected by the first query Checked against table Several ways to use the bitmap indexes Even if SQL is said “non-procedural”, the way you write the statement produces different execution plans ! • Bitmap index 1 • SELECT COUNT(PT_LEP_MC) FROM tag_data WHERE F1(PT_LEP_MC) < F1(20) AND F2(ETA_LEP_MC) > F2(0) UNION ALL SELECT COUNT(PT_LEP_MC) FROM(SELECT PT_LEP_MC, ETA_LEP_MC FROM tag_data WHERE (F1(PT_LEP_MC) <= F1(20) AND F2(ETA_LEP_MC) >= F2(0)) AND NOT (F1(PT_LEP_MC) < F1(20) AND F2(ETA_LEP_MC) > F2(0)))WHERE PT_LEP_MC < 20 AND ETA_LEP_MC > 0; ETA_LEP_MC F1 and F2 are the binningfunctions

  9. Lower limit Upper limit Max Min 0 Selected by the inner query Checked against table Several Ways to Use the Bitmap Indexes • Bitmap index 2 • SELECT COUNT(PT_LEP_MC) FROM (SELECT PT_LEP_MC, ETA_LEP_MC FROM tag_data WHERE F1(PT_LEP_MC) <= F1(20) AND F2(ETA_LEP_MC) >= F2(0)) WHERE PT_LEP_MC < 20 AND ETA_LEP_MC > 0; ETA_LEP_MC F1 and F2 are the binningfunctions

  10. Performance • Parallel query: several processes to perform a single statement • Check the execution plan! • For some queries, it is almost as efficient in time to perform a full table scan (but with several users !)

  11. Size • Block size 8KB • Table size = 203.1 MB • Average bitmap index size = 24.3 MB-> more space used for the indexes than the data (not unusual for some heavily queried Oracle database)

  12. Results Bitmap index 1: only candidates are checked against the table Bitmap index 2 all hits and candidates are checked against the table Heavily depends on the query. I/O doesn’t mean time.

  13. Conclusion • Using bitmap index for tag data in the DBMS can provide a dramatic gain • Clear advantage in using database features when appropriate • Works efficiently with high selectivity (few records result from the query) • B*Tree indexes take a lot of space and are not efficient for multiple dimensional queries

  14. Perspectives • Proper binning referring to the distribution of the data can improve performance of bitmap index, depends on distribution + queries • Possibility to combine bitmap and B*tree indexes • How to this fit for ESD/AOD and references/OCCI?

  15. References & Questions RAW tests http://edmsoraweb.cern.ch:8001/cedar/doc.info?document_id=318241&version=1 TAG tests http://edmsoraweb.cern.ch:8001/cedar/project.info?proj_id=CERN-0000005796

More Related