Querying Large Databases
E N D
Presentation Transcript
Querying Large Databases Rukmini Kaushik
Purpose • Research for efficient algorithms and software architectures of query engines.
Query Execution Engine Architecture • Query processing algorithms – physical algebra • Data Model – logical algebra
Sorting & Hashing • Both are memory intensive. • Memory Concerns - Merge Efficiency & memory management. - Hash table overflow
Aggregation and Duplicate Removal • Aggregation Concept Describes a set of objects with one value. • Algorithms Three Types - Nested Loops - Sorting - Hashing
Aggregation & Duplicate Removal • Nested Loops - Easiest of the three - Doesn’t work well for large inputs • Sorting - Sort for common elements which results in a simple duplicate removal. - Should remove duplicates as early as possible.
Aggregation & Duplicate Removal • Hashing - Hash on group attributes. - Can perform duplicate removal when creating hash table. • Algorithm Analysis Sorting and hashing functions are logarithmic with input size
Complex Query Execution Plan • Purpose - To schedule a query with several operations optimally • Ideas - Right-deep plans - Left-deep plans
Complex Query Execution Plan • Prediction - Use a decision tree of sub-plans - Done by using choose-plan operators • Major Concern - Optimal resource allocation
Parallel Query Execution Mechanism • Goal Obtain speed-up & scale-up • Speed-up - Uses extra hardware for constant size problem - Linear speed-up is optimal - Can be expressed as parallel efficiency
Parallel Query Execution Mechanism • Scale-up - Uses same resources with altered problem size - Can be expressed as parallel efficiency.
Parallel Query Execution Mechanism • Parallel Vs Distributed Systems • Distributed - Locally Autonomous - Also uses Parallelism
Parallel Query Execution Mechanism • Parallel - One center of control - Three types Shared memory Shared Disk Distributed Memory
Parallel Query Execution Mechanism • Three forms of parallelism - Inter Query: Servicing multiple requests at the same time - Inter Operator: Pipelining - Intra Operator: Execute a single operator in multiple processors
Parallel Query Execution Mechanism • Implementation Bracket Models Operator Models • Bracket Model Goal: Generic process template that receives and sends data and performs one operation at a time
Parallel Query Execution Mechanism Number of inputs is limited to two Can be run in parallel by having many templates in the system running simultaneously. • Operator Model Goal: Insert parallel operators in an ordered plan
Parallel Query Execution Mechanism • Uses the exchange operator • Exchange operator - Does not manipulate data - Provides capabilities for parallel query processing - Changes a complex query into a single process
Parallel Algorithms • Idea: More focus on algorithms and parallel execution • Parallel selections and updates - Disk input and output should be made parallel - Selection: Maintain indices near stored data - Updates: Use keys for partitioning attributes
Parallel Algorithms • Parallel Sorting: -classified by - number of parallel inputs - number of parallel outputs - Algorithms consists of local sort and a data exchange step
Parallel Algorithms - Major Concern - Deadlock can be avoided by using range partitioning - having a sufficient size data exchange buffer - using a modified sort algorithm
Query Optimization • Uses the differences between logical and physical aspects • Must keep track of the properties of the inputs • Cost models focus on throughput measures
Tuning query performance • Focus - Guidelines for improving query performance • Guidelines for three points of view - implementor and vendor - database administrator - application programmer
Tuning Query Performance • Implementor System should support indexing and clustering Query optimizer should be reliable and accurate • Administrator Ensure usage of system facilities
Tuning Query Performance carefully choose physical database design provide available and efficient processing resources • Application Programmer Provide high level queries