1 / 56

Advanced Database Systems Module 3 Global Information Sharing – Database Systems Integration

Advanced Database Systems Module 3 Global Information Sharing – Database Systems Integration. School of Computer Information Engineering S HANDONG F INANCE I NSTITUTE. The aim.

Télécharger la présentation

Advanced Database Systems Module 3 Global Information Sharing – Database Systems Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Database SystemsModule 3Global Information Sharing – Database Systems Integration School of Computer Information Engineering SHANDONG FINANCE INSTITUTE

  2. The aim • This module offers a general introduction to data management in an integrated computing environment. The major characteristics of distributed systems constructed through an integration process will be discussed. The emphasis is on Multi-database Systems, Federated systems and Data Warehousing (module 7). • The aim is to provide the background to the basic principles and architectures of information sharing systems, to understand and realize limitations of the various approaches.

  3. Outline of the lecture • Introduction • motivation • Taxonomy of global information sharing solutions, • multi-database systems architectures, • schema integration issues, • federated systems, • Issues in multi-databases, • Summary

  4. Introduction • Information is a key resource • Sharing of information resources increases • Unreasonable: • User to learn all access methods, • Common data model with a single access method, • Integration of information from pre-existing, heterogeneous local databases in a distributed environment, • Global users use transparent methods to access total information in an integrated system, • Local DBs retain the autonomy and serve existing applications

  5. Taxonomy • A wide range of terms is used to describe a DBs that include a global component to access globally shared information and multiple local components that manage information at that site: • distributed database systems, • federated database systems, • multi-databases, • interoperable systems,

  6. Distinctions • Structure of the global component and its interaction with local components, • Level of integration - a tightly coupled systems means that global functions have access to low-level internal functions of the local DBMS

  7. METHOD OF GLOBAL INTEGRATION LEVEL OF GLOBAL INTERFACE TO LOCAL DBMS FULL GLOBAL DATABASE FUNCTION LOCAL NODE TYPES CLASS Homogeneous Databases Internal DBMS Functions Distributed Database Global Schema Yes Heterogeneous Databases DBMS User Interface Global Schema Global Schema Multidatabase Yes Heterogeneous Databases Federated Database DBMS User Interface Partial Global Schemas Yes Access Language Functions Multidatabase Language System DBMS User Interface Heterogeneous Databases Yes Access Language Functions Homogeneous Multidatabase Language System DBMS User Interface with some Internal DBMS Functions Homogeneous Databases Yes Any Data Source that meets the Communication Protocol Application on Top of the DBMS No Global Integration Interoperable Systems No Taxonomy of Global Information Sharing

  8. Global Schema Multidatabases • Global queries access local information through external users interface of the local DBMSs • Global system maintains a global schema, and local sited cooperate closely to maintain the global schema, • Typically, design bottom-up, integrating pre-existing local DBMS without modifying them, • Local DBMSs heterogeneous,

  9. Global Schema Multidatabases • Possible different data models or different implementations of the same data models, • Frequent semantic problems requiring expert intervention and resolution, • Creation of the global schema is a difficult problem, • Scalability of existing solution is limited,

  10. GES GES GES LES LES LES LES LES GCS LES LCS1 LCSn LISn LIS1 Typical Architecture of GSM

  11. Discussion on Architectural Issues • Differences between DDB and GSM • local autonomy • updates at the local sites • maintenance of the data consistency in the presence of the local operations in GSM, • enforcement of the global constraints, • global queries based on the common reference points

  12. GES GES GES LES LES LES LES LES GCS LES LCS1 LCSn LISn LIS1 Local update Problem!!!

  13. Global query1 Global query2 Table on site2 Table on site1 Integrity constraint Local update A simple scenario ? Problem !

  14. Federated Database Systems • There is no single global schema, • Each local systems maintains a local import and export schema, • Export schema - information shared with the global system, • Import schema - data representation and data origin from remote nodes that can be accessed locally,

  15. Federated Database Systems (cont) • Each import scheme is a partial global schema, • Each node cooperates closely with the specific nodes it accesses, • User requests are restricted to local data and the local import schema, • Similar to GSM ,

  16. ES2 ESn ES1 LCS1 LCS2 LCSn LIS2 LISn LIS1 Typical Architecture of FDB

  17. Definition of Multidatabase System • A MDB is a DS that acts as a front end to multiple local DBs. The system provides full database functionality and interacts with local systems at their external user interface. Local systems maintain their local autonomy. • The global system provides some means of resolving the differences in data representations, • The global user can access information from multiple sources with a single relatively simple request ,

  18. Issues in MDBs • Site Autonomy • each local DBMS retains complete control over local data and processing ( frequent claim), • each local site determines: • what information it will share with the global system, • what global requests it will service, • when it will join the MDB, • when it will stop participation, • Local DBMS sees the global system as another local user, • Global optimization in general is very complex,

  19. Differences in Data Representation: • Local DBs developed independently, and there are likely to be many different models, different representations for similar objects, • There is a need for an integrated representation, • Naming conflicts, • Format differences (domain, scale, precision, etc), • local to global transformation can be simple but the inverse can be very complex,

  20. Differences in Data Representation (cont) • Structural differences • data value versus attribute • Missing or conflicting data, • Identification of semantically valid common reference points, • Conflicts among constraints, • Data integration is the main issue – ability to decide how to resolve the differences is an important skill.

  21. A Simple Example Consider two companies data models : Company 1 records are stored in one table Emp; Emp(Emp#, Fname, Lname, Bdate, Dept#, Rank, Salary) Company 2 records are stored in many tables – one for each company department; Dept XX(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) Build an integrated schema – number of changes must be made to at least one of component schemata and consequently to the local Application Programs. Views offer a solution to the structural issues !

  22. Views in Relational Systems Important to understand – 1) Each syntactically correct SQL statement is a view definition, 2) Each realisation of a SQL statement is a table, 3) Any data set (table) can be expressed as a SQL statement to given RDB, 4) View is a virtual table (only materialised when called by application) 5) Basic SQL syntax for view creation is: CREAT VIEW (view_name) AS SELECT (Attributes_name) FROM (table_name) WHERE (condition on attribute-names & constants)

  23. Views offer a solution (example cont) Dept 01(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) Dept 02(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) Dept 03(S-Id, Fname, Sname, Position, Phone#, e-mail, URL) • Table name become an attribute in view definition • Union of individual views to create the final data representation of company 2 • Use null values when union of attributes considered for the data representation for company 1 and 2 • Several examples on Tutorials how to use views to ‘unify’ different data representations.

  24. History Student Course Maths Grade French Student Grade Music ….. Geography Integration Issues – different CSs Different schemas for the same universe of discourse - resulting in different data structures .

  25. Integration Issues Storing the same info in different relational structures. Note that the table below contains the same information but requires less space than the first one: If there are 8 courses then the first storage requires 8 records per student – 24 fields while the second one – 1 record with 9 fields. Also compare the size of fields in bytes !!!

  26. Apply some operations Apply some operations Our goal Views deployment

  27. Different views on the same data Let us consider a simple example: Case 1

  28. Different views on the same data (cont) Case 2

  29. Different views on the same data (cont) Case 3

  30. Different views on the same data (cont) Case 4

  31. View Integration Views . . . schema 1 schema n schema 2 GOAL: find all parts of the input schemas that refer to the same portion of reality and unify their representation conflict analysis and resolution Global Schema integratedschema

  32. Employee Project Works-on Employee Department Project Works-for Has publisher name title title Book Publisher Book Has Problems in View Integration • Different perspectives • Equivalences among constructs in a model

  33. 1 N Employee Project Works-on N M Employee Project Works-on Problems in View Integration • Incompatible design specifications How to do view integration? N-way versus binary merging Where to start? skeleton schema managerial (most important) schema

  34. Naming Conflict Scenario 2: interschema Schema 1 Schema 2 authors authors title title Article Article Report Article Scenario 1: renaming Scenario 3: inter-schema Publication Report d d U U U Report Article Article Name Conflict analysis and Resolution • Possible resolution • renaming • Inter-schema constraint • Source of conflict • synonyms • homonyms Several modification scenarios may be possible

  35. Structural Conflicts and Resolution • Schema concepts are identical • They have exactly the same representation and properties • Schema concepts are compatible • They have different representations that are not contradictory • one is an attribute and the other is an entity • Schema concepts are incompatible • They have contradictory representations or properties • different multiplicities • different keys • reverse generalization relationship User needs to help resolve incompatibilities

  36. 1:N relationship (1,N) (0,1) Assigned-to Employee Department N 1 mandatory optional Employee Department John Sales Assigned-to Alan Marketing James Larry Bill Ed ER Model – Example Relationships

  37. address title code address name name 1 N N M Department Report Contractor Publishes Written-for N N name name 1 M Project-manager Topic-area Writes Contains title dept-name title code research-area name N M N M Author Publication Keyword Author-of Contains Example – Name Conflicts Schema 1 — focused on reports Schema 2 — focused on publications

  38. address title code address name name 1 N N M Department Report Contractor Publishes Written-for N N name name 1 M Author Topic-area Author-of Contains title dept-name title code research-area name N M N M Author Publication Topic-area Author-of Contains Example – Structural Conflicts Schema 1 — focused on reports Schema 2 — focused on publications

  39. title dept-name Publication d Report code Example – Structural Conflicts Inter-schema constraint

  40. address title code address name name Department Report Contractor Publishes Written-for name name Author Topic-area Author-of Contains title title code research-area name Author Publication Topic-area Author-of Contains name Department Publishes Example – Merge Views Schema 1 — focused on reports Schema 2 — focused on publications

  41. address title name title code research-area Department Publication Topic-area Publishes Contains d name Author Author-of Report code Written-for name Contractor address Example – Integrate Schema

  42. Entity Clustering What?Abstraction of an ER diagram so the entire schema can appear on one sheet of paper Why?To make it easier for end-users and designers to communicate Grouping operations • dominance grouping:look for an entity thatdominates (is centralto) other entities

  43. U U Entity Clustering(cont) • abstraction grouping:generalization oraggregation • constraint grouping:general constraintexpressed onrelationships • relationship grouping:look for relationshipsthat dominate entities(ternary or higher)

  44. Clustering Technique • Define points of grouping with functional areas • Locate dominant entities in a functional area through natural relationships, n-ary relationships, integrity constraints, abstractions, etc. • Form entity clusters • Use the grouping operations, preferring to keep entities within their functional area if conflicts arise (or leave unclustered) • Form higher level entity clusters • Apply grouping operations recursively • Validate the cluster diagram • Check for consistency of relationships between entities at each level of the diagram and verify meaning of groupings with user

  45. Division Contains Company organizationfunctional area 1 N Department 1 1 Headed-by Skill Has Managed-by M N 1 1 1 Skill-used P N 1 Project Employee Married-to Assigned-to 1 N 1 N Manages d M U U Location U U Project managementfunctional area Manager Secretary Engineer Technician 1 1 1 N Belongs-to Has-allocated Is-allocated Is-allocated Employee datafunctional area 1 1 1 M Desktop Workstation Prof-assoc Example - Clustering 1

  46. 2.1 Division/Departmentcluster Department Division 1 1 Skill-used 1 Department Headed-by Has Managed-by Skill M N 1 1 1 Assigned-to P Project N 1 Projectmanagementcluster Employee Married-to N 1 2.2 N 1 Project Manages Location M d U U U U Managercluster Secretarycluster Engineercluster Technician 2.3 2.4 2.5 Example - Clustering

  47. Division/Departmentcluster Department Division 1 1 2.1 1 Department Headed-by Has Managed-by Assigned-to Skill-used Skill M N 1 1 N P Project Projectmanagementcluster Employeecluster N 2.2 3.1 1 Project Location M Example - Clustering

  48. Summary of the main concepts • Why view integration is important, • Methods for view integration, • Potential problems when integrating views, • Resolution of the problems,

  49. Schema Integration Steps • Identifying correspondences and conflicts among the schema, • Modifying schema to conform to one another, • Merging the schemata, • Restructuring,

  50. Heterogeneous local DBMSs • Network, hierarchical, relational, object-oriented models to be supported, • Often a trade-off between writing translation code and limiting participation in MDB, • Gateways between different technologies,

More Related