580 likes | 749 Vues
Dive into the world of data partitioning in Very Large Databases (VLDBs) with insights on RDBMS, SQL, architecture, performance, joins, and industry practices. Learn the impact on indexes, local indexes, partitioned tables, and practical concepts like Star Schema.
E N D
Data Partitioning in VLDB Tal Olier Tal.olier@hp.com
Why am I here? Tal Olier – tal.olier@hp.com ~15 years in various software development positions. All of them involved database practice. I work in HP Software, I love working there and I came to tell you this; the lecture is just an excuse getting me into the building :)
Agenda • RDBMS in short (basic terms) • SQL reminder • A bit about (RDBMS) architecture • Performance - access paths • What is table join • VLDB - the size factor • VLDB - industry practice • How joins are executed • Summary
A little history • Was invented in 1970 • By Edgar Frank "Ted" Codd • In IBM labs • Oracle emerged first to the market
Basics – a table • Rows • Columns • Primary key
Basics – a relation • A foreign key (constraint) • A reference • Source table • Source column/s • Target table • Target column/s
People example • People: name, height, smoking, father • Books read: title, author • Schedule details: from, to, activity • Resume details: from, to, salary
Query language • SQL – Structured Query Language • Declarative (vs. procedural) • Requires Internal optimization
SELECT query structure • SELECT • FROM… JOIN • WHERE • GROUP BY • HAVING • ORDER BY
SQL modules • DML (+Select) – Data manipulation language • DDL – Data definition language • TC – Transaction controls (commit/rollback) • DCL – Data control language (grant/revoke) • PE – Procedural extensions
Database server Process Memory Other cache Log cache Buffer cache Server Process Everything is blocks I/O System Data Files Log Files Client Process
IO bound vs. CPU bound • CPU – what is it consumed for? • IO – what is it consumed for?
FTS – full table scan • Scan the whole table – from top to bottom
B Tree index • B tree – allows great spanning that derives small tree height
B+ tree • The leaves are organized in a doubly linked list • B+ tree – allows searching through all values by searching the leaf level only
Database index • Data is sorted according to the index columns • The leaf contain pointers to rows in the table • Search of 1 value in a tree - o (log n) • Smaller index height in B+ trees • Index (database) operations: • Add/remove values • Index seek • Index scan
Inner join • Use join predicate to match rows from 2 table: A and B • Each row in table A is compared to each row in table B to find the pairs of rows that satisfy the join predicate • Than column values for each matched pairs are combined into a result row
department Cartesian product employee
Equi join • A inner join that uses equality comparison in the join predicate • Example:select * from employee emp join department dept on emp.dept_id = dept.dept_id
Equi join OK OK OK
RDBMS – summary in a nutshell • Tables • References • Joins • Indexes • Blocks • I/O
RDBMS – summary in a nutshell • Tables • References • Indexes • Blocks • I/O
Use case: Sales Information • Table: • Customer name • Order number • Order date and time • List of items, amount and prices
Union view Select * from t2007 Union all Select * from t2008 Union all Select * from t2009 Union all Select * from t2010
Local indexes • Index is bound to it’s partition • Drop partition derives drop index • Smaller index heights • Index is always usable • Harder to maintain uniqueness with it
Partitioned table - concepts • Partition column is the key for dividing the data • Performance – only relevant partitions used • Add/drop partition – DDL • Local index – index is bound to a partition
Data tables block block block block block block block T a b l e - A block block block block block block block T a b l e - C block block block block block block block block block block block block block block block block block block block block block T a b l e - B block block block block block block block block block block block block block block block block block block block block block