Physical Database Design

Physical Database Design By Dr. Gabriel

The Process in a Nutshell • setting up the database environment • setting up appropriate security • preliminary performance tuning strategies, from indexing to partitioning and aggregations. • If appropriate, OLAP databases are also designed during this process.

Physical database design • Develop naming and database standards • Develop physical database model • Physical storage • SAN vs direct storage • Fault tolerance vs performance • Create staging and star schema (dimension and fact) tables • Creating indexes

Physical database design • Creating indexes • B-tree indexes • Clustered index • Phone book • Nonclustered index • Book index • For high cardinality columns • Ex: order number, customer number, etc. • Each page in an index B-tree is called an index node • The top node of the B-tree is called the root node • The bottom level of nodes in the index is called the leaf nodes. • Any index levels between the root and the leaf nodes are collectively known as intermediate levels. • In a clustered index • the leaf nodes contain the data pages of the underlying table. • In a nonclustered index • the leaf layer is made up of index pages instead of data pages. • The root and leaf nodes contain index pages holding index rows. • Each index row contains a key value and a pointer to either an intermediate level page in the B-tree, or a row in the leaf level of the index. • The pages in each level of the index are linked in a doubly-linked list.

Physical database design-Creating Indexes (Cont)

Physical database design-Creating Indexes (Cont) • Bitmapped indexes • Good for low cardinality columns • Gender, Yes/No fields, etc. • Is a string of bits for each possible value • Each bit string has 1 bit for each row • Bit is set to 1 if the row has the value the bit string represents • For gender column, 3 bit strings represent “M”, “F”, and “?” • Much smaller than B-tree • Usually built on a single column

Physical database design-Creating Indexes (Cont) • For dimension tables • Unique on surrogate keys • Bitmapped indexes on the most frequently queried dimension attributes or B-tree indexes • For fact tables • Unique on PK • For date-constrained DW, use date field as first one in the key • If simultaneous use of multiple indexes on a table in a query is supported, create a single column index on each fact table key

Physical database design-Creating Indexes (Cont) • Drop indexes if load adds more than 10-20% to the size of the table • Recompute table statistics after loading, and, perhaps, rebuild the indexes • Auto recompute statistics option: Is it a good idea?

Physical database design • Creating partitions, if necessary • Increases performance • Eases maintenance

Aggregations • Not too much, not too little • Monitor queries and design aggregations to match query patterns • Generally, creating report-specific summary tables is inefficient and impractical • Need to be maintained • Usually by ETL processes • May be automatically maintained by using materialized/indexed views or OLAP aggregations • Simple “group by” queries will do the job

OLAP • Major vendors built-in OLAP capabilities into the DB engine • Support data analysis • Provides aggregation management • Supports multiple dimensions • May not be needed if DB provides aggregate management and navigation and BI tools support complex analysis directly against the db

OLAP • Market is not mature yet • Products functionality and scalability varies dramatically in contrast to relational database products that have more similarity than differences • OLAP databases use languages that are not universal • Ex, multi-dimensional expressions (MDX) • Not universal language like SQL • Only OLAP that implement XML/A (XML for analysis) standard support MDX

MS SQL Server-related Physical Database Considerations

Materialized/Indexed Views • An indexed view is a view that has been materialized. • it has been computed and stored. • You index a view by creating a unique clustered index on it. • dramatically improve the performance of some types of queries. • best for queries that aggregate many rows. • not well-suited for underlying data sets that are frequently updated.

Materialized/Indexed Views CREATE TABLE wide_tbl(a int PRIMARY KEY, b int, ..., z int) CREATE VIEW v_abc WITH SCHEMABINDING AS SELECT a, b, c FROM wide_tbl WHERE a BETWEEN 0 AND 1000 CREATE UNIQUE CLUSTERED INDEX i_abc ON v_abc(a)

Partitioning • The steps for creating a partitioned table or index include the following: • 1. Create a partition function to specify how a table or index that uses the function can be partitioned. • 2. Create a partition scheme to specify the placement of the partitions of a partition function on filegroups. • 3. Create a table or index using the partition scheme.

Creating a Partition Function • A partition function specifies how the table or index is partitioned. • The function maps the domain into a set of partitions. • To create a partition function, you specify the number of partitions, the partitioning column, and the range of partition column values for each partition. • Note that when you specify the partitioning column, you can only specify one.

Creating a Partition Function CREATE PARTITION FUNCTION pf_exppeople (bigint) AS RANGE FOR VALUES (111111317320474, 111111621356435, 111111778050047, 111111981946587, 151670034507413, 252440002902502, 301111111111111, 401111111111111, 501111111111111 );

Partition Scheme • A partition scheme maps the partitions produced by a partition function to a set of filegroups that you define. • When you create a partition scheme, you define the filegroups where the table partitions are mapped, based on the parameters of the partition function. • You must specify enough filegroups to hold the number of partitions.

Partition Scheme Create PARTITION SCHEME ps_exppeople AS PARTITION pf_exppeople TO (fg_exp1, fg_exp2, fg_exp3,fg_exp4,fg_exp5,fg_exp6,fg_exp7,fg_exp8,fg_exp9,fg_exp10);

Create a table using the partition scheme CREATE TABLE person_data( id bigint NOT NULL, lname varchar(32) NULL, fname varchar(32) NULL, …,) ON ps_exppeople(id)

Primary/Foreign Key Constraints • Create Table Statement – General Format • CREATE TABLE table_name • (column1_namedatatypedefault_clause null/not null column_constraints, • column2_namedatatypedefault_clause null/not null column_constraints, • ….., • table_constraints); • Primary Key Constraint • CONSTRAINT constraint_name PRIMARY KEY (Column names) • Foreign Key Constraint • CONSTRAINT constraint_name FOREIGN KEY (FK column names) REFERENCES parent_table_name (parent column names)

Primary/Foreign Key Constraints • Example: Create table statement for the Shipments table CREATE TABLE Shipments ( SNO CHAR(5) NOT NULL, PNO CHAR(5) NOT NULL, JNO CHAR(5) NOT NULL, Shipmentdate DATE NOT NULL, QTY INTEGER NOT NULL, CONSTRAINT ShipmentPK PRIMARY KEY (SNO,PNO,JNO, Shipmentdate), CONSTRAINT SupplierFK FOREIGN KEY (SNO) REFERENCES Suppliers (SNO) CONSTRAINT PartFK FOREIGN KEY (PNO) REFERENCES Parts (PNO) CONSTRAINT JobFK FOREIGN KEY (JNO) REFERENCES Jobs (JNO) )

Primary/Foreign Key Constraints • Alter Table Statement • Add Constraints • General Format: • ALTER TABLE table_name • ADD CONSTRAINT constraint_definition • Drop Constraints • General Format: • ALTER TABLE table_name • DROP CONSTRAINT constraint_name

Creating indexes • create index statement create [clustered /nonclustered] index index_name on table_name (column_name)

Questions ?

Physical Database Design