130 likes | 259 Vues
This report presents research on dynamic aggregate queries conducted by Kenneth A. Ross and Junyan Ding from Columbia University. It highlights the evaluation of complex query processing and data warehousing techniques, focusing on efficient performance goals and technical details for implementing main memory databases. The project aims to develop user-friendly interfaces for interactive data exploration, analyze large datasets, and enable fast query responses. The findings suggest promising initial results and future plans for integration and self-tuning capabilities.
E N D
Interactive Dynamic Aggregate Queries Kenneth A. Ross, Junyan Ding Columbia University DGRC Evaluation Board Meeting
Research Experience • Complex query processing • Data Warehousing • Main memory databases Students: Kazi Zaman, Junyan Ding DGRC Evaluation Board Meeting
Outline • Past work on Datacubes • Scenario Overview • Performance Goals • Technical Details • Conclusion DGRC Evaluation Board Meeting
Datacubes State Year Grade Sales State Year Grade Sales CA 1997 Regular 90 CA 1997 Regular 90 CA 1997 ALL 90 NY 1997 Premium 70 ALL 1997 Regular 90 CA 1998 Premium 65 CA ALL Regular 90 CUBE BY (sum Sales) ALL 1997 Regular 90 NY 1998 Premium 95 ALL 1997 ALL 160 ALL ALL Regular 90 CA ALL ALL 155 ALL ALL ALL 320 Additional records Large increase in total Size, especially with many dimensions ……. DGRC Evaluation Board Meeting
Level-2 Store Level-1 Store Finest granularity cuboid Query q records in linked lists Selected coarse records in hash table Slot directory Two-Level Framework DGRC Evaluation Board Meeting
Datacube Implementation Issues • Fast query response (2-4 ms/query), but • Rigid: • Need to know dimensions and aggregate functions in advance for precomputation. • Not appropriate when hundreds (PUMS) or thousands (NHANES) of dimensions are available. DGRC Evaluation Board Meeting
Scenario Outline User Web Data Request Dynamic Query Mediator ... Graphical User Interface UnifiedResults TraditionalDBMS Dynamic Query Engine Data Files e.g., PUMS DGRC Evaluation Board Meeting
Engine Decoupled from Interface • Can use a variety of interfaces • Multiple connections to one server • Can “do one thing well” • Client/Server parallelism • Abstract interaction via API DGRC Evaluation Board Meeting
Engine Performance Goals • Interactivedata exploration • Millions of records • Thousands of columns (but look at ten or so at a time) • Aggregates and statistical measures • Fine adjustments at 30 answers/second. DGRC Evaluation Board Meeting
Technical Details • Main Memory Implementation • Multidimensional tree structures • Cache consciousness • Branch Misprediction • SIMD • Asynchronous work DGRC Evaluation Board Meeting
Initial Results • Preliminary implementation • tens of answers/second on PUMS 1% data (>2 million records) • “Vanilla” user interface under construction DGRC Evaluation Board Meeting
Conclusions and Plan • First step towards dynamic aggregate queries • More general dependence on parameters • More ambitious user interfaces DGRC Evaluation Board Meeting
Plan • Integrate with user interface to generate dynamic queries. • Self-tuning capability. • Multiple data sets. • Work with agencies to generate value • For intra-agency analysis • For enhanced data dissemination DGRC Evaluation Board Meeting