240 likes | 338 Vues
Explore the strategic mapping of logical models into physical designs, denormalization techniques, and the efficiency of distributed database systems. Learn about implementation methods, objectives, and key issues in distributed database architectures.
E N D
Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009
Mapping the logical model onto physical design • Entities become tables • More often than not! • Attributes become fields (columns) • Unique identifiers become primary keys • Relationships implemented by foreign key columns • Resolve M:N relationships by inserting intersection table
Mapping considerations • Independence • Privacy • Efficiency of queries
Denormalisation • Joins take time! • Split or merge normalised entities based on frequent associated use • Remove redundant relationships • Merge entities with 1:1 relationships • Use summary fields • Use summary tables and views
Using summary field(1) • Consider running a query “give the total value of all orders for customer X” How many joins?
Using summary field (2) • Note summary field in Orders table How many joins now?
Distributed database systems • Special rules apply!
The traditional model • One centralised database • Terminals at remote locations • Disadvantages • Networks are slow (esp WANS!) • Central machine does all processing • If central machine fails, database is down (Integrity, redundancy and disaster recovery considered in later lectures!)
The Client/Server model • Client – application – “front end” • Server – DBMS – “back end” • Still dependent on central database
Client responsibilities • Manages user interface • Accepts user data • Has local processing capability within the application • Generates database requests and transmits them via network to server • Receives results from server and formats them as required by application
Server responsibilities • Accepts database requests from client • Processes database requests • Handles security issues • Deals with concurrency issues • Optimizes queries • Handles recovery/rollback issues • Returns results to client
Distributed database architecture • A collection of logically related “sites”, connected together so that the users view is that of a single database at a single location. • Each site is a database in it’s own right • Not necessarily physically or geographically separated, but often are – and are logically separated.
Advantages • Organisations are distributed, why shouldn’t their data be? • Improved efficiency • Store data close to where it’s used
Types of DDS • Homogenous – same type of RDBMS at each site (easy!) • Heterogeneous – different types of DBMS at each site (not so easy!)
Implementation methods (1) • Fragmentation – splitting data between sites • Horizontal – row based – e.g. store all employee records for a location at that location • Vertical – column based – e.g. store all payroll columns in payroll department, all other employee data in HR • Either way, fragments must be able to be put back together!
Implementation methods (2) • Replication • Controlled duplication of data at more than one site • Update propagation?
Objectives (1) • Local autonomy • Local data locally owned and managed – minimal data requirements from remote sites. • No reliance on central site • Continuous operation • Reliability • Availability
Objectives (2) • Location independence • From user’s view, all data is at their site. • Fragmentation independence • Needs joins and unions to put fragments back together • Replication independence
Objectives (3) • Distributed query processing • Distributed transaction management • Transactions carried out by “agents” at distributed sites • Two-phase commit • Locking issues (later lecture)
Objectives (4) • Hardware independence • Operating system independence • Network independence • DBMS independence
DDS issues • Query processing • Optimisation even more important • Catalogue (data dictionary) management • Centralised? • Fully replicated? • Partitioned? • Combination of first and third?
DDS issues • Update propagation • An issue where replication is used. • “Primary copy” system • Recovery • Two-phase commit • Recovery • Locking strategies
Summary • Mapping the logical model • Denormalisation • Traditional database architecture • Client/server model • Distributed Database systems • Advantages • Objectives • Implementation methods • Issues
Further reading • Rolland chapter 10 • Hoffer chapters 12 • Denormalisation - click to follow the link!