DBMS Software

DBMS Software • Week 7 • Conceptual Architecture - Ch 4 • Software Architecture - Ch 4 • Toolkit - Ch 11,12,13 • Kernel Ch 14, 15, 16, 17

ANSI/SPARC Architecture • Problem of Data independence • In ‘classical systems’ • Application programs ‘know’ about how and where data is stored • Problems when • additional data stored but not of relevance to prog • data reorganised for efficiency but no change in content

ANSI/SPARC (2) • Two distinctions • between the global ‘logical’ data model and the way the data is stored in physical data storage on disk • isolates issues of efficient storage such as indexes, replication of data for fast access and backup • between local application programs and the global data model • isolates functional areas from concern for the full conceptual model - remember ISIS has 250 tables

ANSI/SPARC (3) • External view - defines VIEWS which bring together a subset of the full model for a specific functional area - e.g. enrolment • Conceptual model - full ‘logical’ model of the organisation • Physical - storage of data in underlying files using the appropriate file structures, and allocated to storage units

ANSI/SPARC example • Lecturer view of ISIS • those students which she teaches (restriction) • relevant data items (projection) • denormalised (eg. name and award name by module) • Full model is 250 tables • Physical data storage • duplicates key files for fast retrieval • builds indexes based on common queries

DBMS Software Components • User programs • user developed programs (eg. as VB Macros, PLSQL procedures, PHP scripts,stored Queries) • DBMS Toolkit • tools to help user build systems and user programs • Interface • language and protocol for communication between User programs and Kernel • Kernel • core Database functions - data storage and retrieval • Operating System - file storage

Interface • User programs need to communicate with the Kernel to • update the Schema (Data Definition Language) • update the Factbase (Data Manipulation Language) • Interface must handle: • sending requests to the Kernel e.g. with SQL • receiving results from the Kernel • linking to Kernel, handling errors (e.g ODBC,JDBC) • Receiving results is tricky: • send whole relation in some serialized format • send pointer to temporary table and get each row

Interface (2) • Scripting languages such as PL/SQL (Oracle’s application language) and PHP (a Web server language) send SQL and provide routines for access to the results e.g. • the notion of a ‘CURSOR’ a pointer to the next row of a table to be returned • Access can automate this connection so that a table located on a remote server ‘appears’ to be a local table.

Kernel functions • CRUD - Create, Read, Update, Destroy • at Factbase and Schema level • Data Dictionary - Schema storage • Transaction Management - run-unit completion • Concurrency Control - multiple users • Recovery - backup and restore • Authorisation - users, passwords, areas • Data Communications ( or by OS) • Data integrity - foreign keys etc. • Import/Export, Monitoring

Schema table definitions integrity constraints views (stored queries) access control Stored in Data Dictionary DD is a Database itself, the System Catalog Manipulated with DDL and DCL Factbase tables Manipulated with DML Interface with operating system for file access Transaction control concurrency locking Kernel (2)

Toolkit - Application Development Tools • E.g. Access • Graphical User Interface (GUI) • QBE to generate SQL queries • Form and report definitions • VB Macros for user programs • Natural Language Interface • translation from natural language to SQL

User programs • SQL command line interface • SQLPlus for Oracle, similar for MySQL • Pre-compiler (COBOL..) • Program contains SQL statements which need to be compiled into calls to Kernel functions • Scripts (PHP,Perl) • Generation of SQL calls at runtime

CASE tools • Select SSADM, Rational Rose, Oracle Designer 2000 • allow developers to develop schema from ER model

Admin tools • Database software control • starting and stopping Kernel • backup and restore • Reconstruct indexes • Access control • create users, user groups, • assign user access rights to tables • Import and Export of data and schema • Schema migration - major schema change • monitoring and tuning

Kernel operations • Underlying file types • Indexing • Transactions • Locks • Rollback • Query optimisation

Physical organisation • Secondary storage (on disk) divided into blocks - say 16K in size • For a given file system, blocks have control info(index to records in block..) and free space as well as user records • structured blocks are called ‘pages’ • Data is transferred between main memory and secondary storage in pages • Pages can be held in memory (cached) for speed of access

Indexes in SQL • In Access, just mark a field as indexed • in SQL • CREATE INDEX enameInd on EMP(Ename); • drop index • DROP INDEX • Extra indexes speed retrieval but slow down updates (because Index structures must be updated too)

Transactions • ‘Transaction’ in a DBMS means an elemental unit of work which is either completed in full or fails totally • DB Transactions are not the same as a user transaction such as ‘place an order’ • Each transaction has • begin transaction • some work - reads and writes to the db • either COMMIT (make changes permanent) • or ROLLBACK (destroy all evidence of work)

ACID • A Atomicity • Transaction is indivisible unit of work - can’t partly succeed • C Consistancy • Transaction (failed or completed) must leave DB in a consistant state • I Isolation • Each transaction must appear to run in isolation to any other transactions • D Durability • Work done must be permanent

Implementing ACID • A Atomicity • system must be able to undo work if transaction fails - e.g. due to failed integrity constraint • system could record the before state of an record and restore these when rollback required • C Consistancy • fail if an integrity or transaction constraint violated • I Isolation • concurrency problem >> • D Durability • DB backup and recovery, transaction logging

Concurrency Control • If transactions could be serialised - executed one at a time - each would execute in Isolation • but this would slow the system down - most of the time is spent waiting for disk access • If not prevented, interaction between transactions can cause anomolies

T1 bot read(x) x=x+1 write(x) commit if x=2 at start, it should be 4 at end T2 bot read(x) x=x+1 write(x) commit but its only 3! Lost Update

T1 bot read(x) x=x+1 write(x) abort x should be 3 T2 bot read(x) x=x+1 write(x) commit but it is 4! Dirty Read

T1 bot read(x) read(x) commit T1 sees different values of x during its execution T2 bot read(x) x=x+1 write(x) commit Inconsistent Read

x+y must = 100 T1 bot read(y) read(x) ? x+y=90 commit T2 bot read(y) read(x) x=x -10 write(x) y=y+10 write(y) commit now x+y = 100 again Ghost update

Scheduling • A schedule is a sequence of operations (reads or writes) from multiple transactions • Reads and writes are assumed to be atomic themselves • The Scheduler tries to create a schedule which preserves Isolation • Serial schedule puts one transaction after another, but this will be ineffficient • Need to find equivalent, shorter schedules

Pessimistic/Optimistic • Pessimistic • assume transactions will interact and prevent it • Locking • Timestamping • Optimistic • assume transactions will NOT interact but take action if they do

Locking • Read lock (shared lock) • any number can read but no one can write • readers need counting • Write lock (exclusive lock) • no one else can read or write • Unlock - drop the lock • Transaction waits if resource already locked • Locks held in DB - lock table • Lock granularity • best if only a record is locked but lock table large

Two-phase locking • Discipline on transactions to ensure schedule is serialiable • Growing phase: • Transaction must acquire all its locks in one phase • Lock level can be escalated ( read > write) • Shrinking phase • Transaction must release all its locks in the second phase • Lock level can reduce (write > read) • Can’t acquire a lock after releasing a lock

T1 bot wlock(x) read(x) x=x+1 write(x) unlock(x) commit T2 bot wlock(x) wait wait wait read(x) x=x+1 write(x) commit Lost Update with Locking

T1 bot wlock(x) read(x) x=x+1 write(x) unlock(x) abort x should be 3 but it is still 4 ! T2 bot wlock(x) wait wait wait read(x) x=x+1 write(x) unlock(x) commit Dirty Read with locking

Strict Two-phase locking • Locks can only be released after commit • Deadlock • T1 wlocks(x) and waits to wlock(y) • T2 wlocks(y) and waits to wlock(x)

Deadlock • Deadlock • T1 wlocks(x) and waits to wlock(y) • T2 wlocks(y) and waits to wlock(x) • Approaches • timeout - lock expires so transaction aborts • deadlock detection - identifing deadlocks and killing one transaction

DBMS Software

DBMS Software

Presentation Transcript

Relational DBMS

Disappearing DBMS

DBMS

Parallel DBMS

DBMS

Temporal DBMS

Spatial DBMS

Distributed DBMS

DBMS

DBMS

DBMS Architectures

dbms

Parallel DBMS

Parallel DBMS