610 likes | 789 Vues
Advanced Database Systems Module 3 Global Information Sharing – Database Systems Integration. School of Computer Information Engineering S HANDONG F INANCE I NSTITUTE. The aim.
E N D
Advanced Database SystemsModule 3Global Information Sharing – Database Systems Integration School of Computer Information Engineering SHANDONG FINANCE INSTITUTE
The aim • This module offers a general introduction to data management in an integrated computing environment. The major characteristics of distributed systems constructed through an integration process will be discussed. The emphasis is on Multi-database Systems, Federated systems and Data Warehousing (module 7). • The aim is to provide the background to the basic principles and architectures of information sharing systems, to understand and realize limitations of the various approaches.
Outline of the lecture • Introduction • motivation • Taxonomy of global information sharing solutions, • multi-database systems architectures, • schema integration issues, • federated systems, • Issues in multi-databases, • Summary
Introduction • Information is a key resource • Sharing of information resources increases • Unreasonable: • User to learn all access methods, • Common data model with a single access method, • Integration of information from pre-existing, heterogeneous local databases in a distributed environment, • Global users use transparent methods to access total information in an integrated system, • Local DBs retain the autonomy and serve existing applications
Taxonomy • A wide range of terms is used to describe a DBs that include a global component to access globally shared information and multiple local components that manage information at that site: • distributed database systems, • federated database systems, • multi-databases, • interoperable systems,
Distinctions • Structure of the global component and its interaction with local components, • Level of integration - a tightly coupled systems means that global functions have access to low-level internal functions of the local DBMS
METHOD OF GLOBAL INTEGRATION LEVEL OF GLOBAL INTERFACE TO LOCAL DBMS FULL GLOBAL DATABASE FUNCTION LOCAL NODE TYPES CLASS Homogeneous Databases Internal DBMS Functions Distributed Database Global Schema Yes Heterogeneous Databases DBMS User Interface Global Schema Global Schema Multidatabase Yes Heterogeneous Databases Federated Database DBMS User Interface Partial Global Schemas Yes Access Language Functions Multidatabase Language System DBMS User Interface Heterogeneous Databases Yes Access Language Functions Homogeneous Multidatabase Language System DBMS User Interface with some Internal DBMS Functions Homogeneous Databases Yes Any Data Source that meets the Communication Protocol Application on Top of the DBMS No Global Integration Interoperable Systems No Taxonomy of Global Information Sharing
Global Schema Multidatabases • Global queries access local information through external users interface of the local DBMSs • Global system maintains a global schema, and local sited cooperate closely to maintain the global schema, • Typically, design bottom-up, integrating pre-existing local DBMS without modifying them, • Local DBMSs heterogeneous,
Global Schema Multidatabases • Possible different data models or different implementations of the same data models, • Frequent semantic problems requiring expert intervention and resolution, • Creation of the global schema is a difficult problem, • Scalability of existing solution is limited,
GES GES GES LES LES LES LES LES GCS LES LCS1 LCSn LISn LIS1 Typical Architecture of GSM
Discussion on Architectural Issues • Differences between DDB and GSM • local autonomy • updates at the local sites • maintenance of the data consistency in the presence of the local operations in GSM, • enforcement of the global constraints, • global queries based on the common reference points
GES GES GES LES LES LES LES LES GCS LES LCS1 LCSn LISn LIS1 Local update Problem!!!
Global query1 Global query2 Table on site2 Table on site1 Integrity constraint Local update A simple scenario ? Problem !
Federated Database Systems • There is no single global schema, • Each local systems maintains a local import and export schema, • Export schema - information shared with the global system, • Import schema - data representation and data origin from remote nodes that can be accessed locally,
Federated Database Systems (cont) • Each import scheme is a partial global schema, • Each node cooperates closely with the specific nodes it accesses, • User requests are restricted to local data and the local import schema, • Similar to GSM ,
ES2 ESn ES1 LCS1 LCS2 LCSn LIS2 LISn LIS1 Typical Architecture of FDB
Definition of Multidatabase System • A MDB is a DS that acts as a front end to multiple local DBs. The system provides full database functionality and interacts with local systems at their external user interface. Local systems maintain their local autonomy. • The global system provides some means of resolving the differences in data representations, • The global user can access information from multiple sources with a single relatively simple request ,
Issues in MDBs • Site Autonomy • each local DBMS retains complete control over local data and processing ( frequent claim), • each local site determines: • what information it will share with the global system, • what global requests it will service, • when it will join the MDB, • when it will stop participation, • Local DBMS sees the global system as another local user, • Global optimization in general is very complex,
Differences in Data Representation: • Local DBs developed independently, and there are likely to be many different models, different representations for similar objects, • There is a need for an integrated representation, • Naming conflicts, • Format differences (domain, scale, precision, etc), • local to global transformation can be simple but the inverse can be very complex,
Differences in Data Representation (cont) • Structural differences • data value versus attribute • Missing or conflicting data, • Identification of semantically valid common reference points, • Conflicts among constraints, • Data integration is the main issue – ability to decide how to resolve the differences is an important skill.
A Simple Example Consider two companies data models : Company 1 records are stored in one table Emp; Emp(Emp#, Fname, Lname, Bdate, Dept#, Rank, Salary) Company 2 records are stored in many tables – one for each company department; Dept XX(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) Build an integrated schema – number of changes must be made to at least one of component schemata and consequently to the local Application Programs. Views offer a solution to the structural issues !
Views in Relational Systems Important to understand – 1) Each syntactically correct SQL statement is a view definition, 2) Each realisation of a SQL statement is a table, 3) Any data set (table) can be expressed as a SQL statement to given RDB, 4) View is a virtual table (only materialised when called by application) 5) Basic SQL syntax for view creation is: CREAT VIEW (view_name) AS SELECT (Attributes_name) FROM (table_name) WHERE (condition on attribute-names & constants)
Views offer a solution (example cont) Dept 01(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) Dept 02(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) Dept 03(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) • Table name become an attribute in view definition • Union of individual views to create the final data representation of company 2 • Use null values when union of attributes considered for the data representation for company 1 and 2 • Several examples on Tutorials how to use views to ‘unify’ different data representations.
History Student Course Maths Grade French Student Grade Music ….. Geography Integration Issues – different CSs Different schemas for the same universe of discourse - resulting in different data structures .
Integration Issues Storing the same info in different relational structures. Note that the table below contains the same information but requires less space than the first one: If there are 8 courses then the first storage requires 8 records per student – 24 fields while the second one – 1 record with 9 fields. Also compare the size of fields in bytes !!!
Apply some operations Apply some operations Our goal Views deployment
Different views on the same data Let us consider a simple example: Case 1
View Integration Views . . . schema 1 schema n schema 2 GOAL: find all parts of the input schemas that refer to the same portion of reality and unify their representation conflict analysis and resolution Global Schema integratedschema
Employee Project Works-on Employee Department Project Works-for Has publisher name title title Book Publisher Book Has Problems in View Integration • Different perspectives • Equivalences among constructs in a model
1 N Employee Project Works-on N M Employee Project Works-on Problems in View Integration • Incompatible design specifications How to do view integration? N-way versus binary merging Where to start? skeleton schema managerial (most important) schema
Naming Conflict Scenario 2: interschema Schema 1 Schema 2 authors authors title title Article Article Report Article Scenario 1: renaming Scenario 3: inter-schema Publication Report d d U U U Report Article Article Name Conflict analysis and Resolution • Possible resolution • renaming • Inter-schema constraint • Source of conflict • synonyms • homonyms Several modification scenarios may be possible
Structural Conflicts and Resolution • Schema concepts are identical • They have exactly the same representation and properties • Schema concepts are compatible • They have different representations that are not contradictory • one is an attribute and the other is an entity • Schema concepts are incompatible • They have contradictory representations or properties • different multiplicities • different keys • reverse generalization relationship User needs to help resolve incompatibilities
1:N relationship (1,N) (0,1) Assigned-to Employee Department N 1 mandatory optional Employee Department John Sales Assigned-to Alan Marketing James Larry Bill Ed ER Model – Example Relationships
address title code address name name 1 N N M Department Report Contractor Publishes Written-for N N name name 1 M Project-manager Topic-area Writes Contains title dept-name title code research-area name N M N M Author Publication Keyword Author-of Contains Example – Name Conflicts Schema 1 — focused on reports Schema 2 — focused on publications
address title code address name name 1 N N M Department Report Contractor Publishes Written-for N N name name 1 M Author Topic-area Author-of Contains title dept-name title code research-area name N M N M Author Publication Topic-area Author-of Contains Example – Structural Conflicts Schema 1 — focused on reports Schema 2 — focused on publications
title dept-name Publication d Report code Example – Structural Conflicts Inter-schema constraint
address title code address name name Department Report Contractor Publishes Written-for name name Author Topic-area Author-of Contains title title code research-area name Author Publication Topic-area Author-of Contains name Department Publishes Example – Merge Views Schema 1 — focused on reports Schema 2 — focused on publications
address title name title code research-area Department Publication Topic-area Publishes Contains d name Author Author-of Report code Written-for name Contractor address Example – Integrate Schema
Entity Clustering What?Abstraction of an ER diagram so the entire schema can appear on one sheet of paper Why?To make it easier for end-users and designers to communicate Grouping operations • dominance grouping:look for an entity thatdominates (is centralto) other entities
U U Entity Clustering(cont) • abstraction grouping:generalization oraggregation • constraint grouping:general constraintexpressed onrelationships • relationship grouping:look for relationshipsthat dominate entities(ternary or higher)
Clustering Technique • Define points of grouping with functional areas • Locate dominant entities in a functional area through natural relationships, n-ary relationships, integrity constraints, abstractions, etc. • Form entity clusters • Use the grouping operations, preferring to keep entities within their functional area if conflicts arise (or leave unclustered) • Form higher level entity clusters • Apply grouping operations recursively • Validate the cluster diagram • Check for consistency of relationships between entities at each level of the diagram and verify meaning of groupings with user
Division Contains Company organizationfunctional area 1 N Department 1 1 Headed-by Skill Has Managed-by M N 1 1 1 Skill-used P N 1 Project Employee Married-to Assigned-to 1 N 1 N Manages d M U U Location U U Project managementfunctional area Manager Secretary Engineer Technician 1 1 1 N Belongs-to Has-allocated Is-allocated Is-allocated Employee datafunctional area 1 1 1 M Desktop Workstation Prof-assoc Example - Clustering 1
2.1 Division/Departmentcluster Department Division 1 1 Skill-used 1 Department Headed-by Has Managed-by Skill M N 1 1 1 Assigned-to P Project N 1 Projectmanagementcluster Employee Married-to N 1 2.2 N 1 Project Manages Location M d U U U U Managercluster Secretarycluster Engineercluster Technician 2.3 2.4 2.5 Example - Clustering
Division/Departmentcluster Department Division 1 1 2.1 1 Department Headed-by Has Managed-by Assigned-to Skill-used Skill M N 1 1 N P Project Projectmanagementcluster Employeecluster N 2.2 3.1 1 Project Location M Example - Clustering
Summary of the main concepts • Why view integration is important, • Methods for view integration, • Potential problems when integrating views, • Resolution of the problems,
Schema Integration Steps • Identifying correspondences and conflicts among the schema, • Modifying schema to conform to one another, • Merging the schemata, • Restructuring,
Heterogeneous local DBMSs • Network, hierarchical, relational, object-oriented models to be supported, • Often a trade-off between writing translation code and limiting participation in MDB, • Gateways between different technologies,