190 likes | 332 Vues
This research presents an innovative framework for elastic data partitioning within cloud-based SQL processing systems. It explores various partitioning techniques, including range and hash partitioning, and discusses the implications of these methods on query processing efficiency. The framework aims to minimize data movement and re-partitioning during scaling operations. By analyzing partitioning attributes and types, this study provides a detailed understanding of how to optimize resource allocation in a shared-nothing parallel DBMS environment.
E N D
Elastic Data Partitioning for Cloud-based SQL Processing Systems Lipyeow Lim Information & Computer Science Department University of Hawai`i at Mānoa Lipyeow Lim -- University of Hawai`i at Manoa
Outline Lipyeow Lim -- University of Hawai`i at Manoa
Shared Nothing Parallel DBMS • results • query Parallel DB layer • DBMS • DBMS • DBMS • DBMS Network Lipyeow Lim -- University of Hawaii at Manoa
Cloud-basedArchitecture Virtual Machines CPU CPU CPU CPU Memory Memory Memory Memory Disk Disk Disk Disk (Virtualized) Network Amazon EC2 Physical Resources Lipyeow Lim -- University of Hawaii at Manoa
“Scaling” Up and Down • results • query Parallel DB layer • DBMS • DBMS • DBMS • DBMS • DBMS • DBMS Network Lipyeow Lim -- University of Hawaii at Manoa
Problem Statement Given • A relation T • A partitioning function F on a fixed partitioning key • An initial number p of partitions/fragments • An initial mapping of p fragments to p database nodes • A target number q of partitions Find • a mapping of {T1, T2, .. Tp} to {T1, T2, ... Tq} and • an assignment of the q fragments to q database nodes Such that we minimize • The number of tuples re-partitioned • The number of tuples moved between database nodes Lipyeow Lim -- University of Hawai`i at Manoa
Partitioning a Relation 2 2 4 6 7 7 7 13 20 : 2 2 4 6 7 7 7 13 20 : • Partitioning attribute/key. • Partitioning type. Eg. Range or Hash • Partitioning constraint. Eg. Equi-width, equi-size • Number of partitions/fragments. hash function Lipyeow Lim -- University of Hawai`i at Manoa
Horizontal Fragmentation: Range Partition Partition 1 Partition 2 • Range Partition on rating column • Partition 1: 0 <= rating < 5 • Partition 2: 5 <= rating <= 10 Lipyeow Lim -- University of Hawaii at Manoa
Range Partition: Query Processing Partition 1 • Which partitions? • Better than non-parallel ? SELECT * FROM Sailors S SELECT * FROM Sailors S WHERE rating = 2 Partition 2 SELECT * FROM Sailors S WHERE age > 30 SELECT * FROM Sailors S WHERE rating < 2 and age < 30 Lipyeow Lim -- University of Hawaii at Manoa
Horizontal Fragmentation: Hash Partition Partition 1 • Hash partitioning using hash function • Partition = rating mod 2 Partition 2 Lipyeow Lim -- University of Hawaii at Manoa
Hash Partition: Query Processing Partition 1 • Which partitions? • Better than non-parallel ? SELECT * FROM Sailors S SELECT * FROM Sailors S WHERE rating = 2 Partition 2 SELECT * FROM Sailors S WHERE age > 30 SELECT * FROM Sailors S WHERE rating < 2 and age < 30 Lipyeow Lim -- University of Hawaii at Manoa
Method N: Naive Resize Lipyeow Lim -- University of Hawai`i at Manoa
Method C : Chunk-based Lipyeow Lim -- University of Hawai`i at Manoa
Method T : Tree-based Lipyeow Lim -- University of Hawai`i at Manoa
Method H : Hash-based Lipyeow Lim -- University of Hawai`i at Manoa