Design Issues in Distributed Database Management Systems
Explore design considerations in distributed database management systems, covering fragmentation, data integration, schema mapping, and security objectives.
Design Issues in Distributed Database Management Systems
E N D
Presentation Transcript
Reading • Textbook: Ch. 4 CSCE 824 - Spring 2011
Design Issues • Placing of data and programs (DBMS and application) • Network issues CSCE 824 - Spring 2011
Level of Sharing • No sharing • Data sharing • Data and program sharing Heterogeneous environment! CSCE 824 - Spring 2011
Top-Down Design • Global Conceptual schema distribution • Fragmentation • Replication • Allocation • Figure 3.2 CSCE 824 - Spring 2011
Correctness of Fragmentation • Completeness: FR={R1, …, Rn} • Reconstruction: R=Ri, RiR • Disjointness: • Horizontal: does not djRisuch that djRkwhere ki • Vertical: same as horizontal for non-primary key attributes 1&2: Lossless-join (normalization) CSCE 824 - Spring 2011
Data Directory • Global vs. local conceptual schemas • How to search? • Where to store? • Single vs. multiple copies? CSCE 824 - Spring 2011
Current Research • Allocation: new requirements, technology, etc. • Where to store the fragments? • Dynamic environment • Usage pattern • Application characteristics • Network changes • Security CSCE 824 - Spring 2011
Bottom-Up Approach • Multi-database systems • How to integrate them into 1 database? • Interoperability CSCE 824 - Spring 2011
Database Integration • Physical integration • Materialized database: data warehouses • Extract-transform-load (ETL) tools • Logical integration • Virtual (not materialized) integration • Enterprise Information Integration CSCE 824 - Spring 2011
Data Warehouses • On-line Analytical Processing (OLAP) applications: • Decision support systems • Trend analysis and forecasting • Complex queries, large databases • Materialized view maintanence CSCE 824 - Spring 2011
Logical Integration • No materialized global database • Virtual integration: data remains at the local (operational) databases • Global conceptual schema may not contain everything from local schemas • Autonomous and heterogeneous local systems CSCE 824 - Spring 2011
Bottom-Up Design • Global Conceptual Schema (GCS or mediated schema) • Defined first: local conceptual schemas (LCS) are mapped to GCS • Defined during the integration of the LCSs and develop the corresponding mappings from LCSs to the GCS CSCE 824 - Spring 2011
GCS Defined First • Local-as-view (LAV) systems • Each LCS is treated as a view over the GCS • Query results: constrained to the objects in the local DBs while the GCS definition may be richer • Potential incomplete answers • Global-as-view GCS is defined as a set of views over the LCSs • View definition defines how to derive elements of the GCS • Query results: constrained to the GCS while the local DBs might be richer CSCE 824 - Spring 2011
Design Tasks • Schema translation • Schema generation • Figure 4.3 CSCE 824 - Spring 2011
Intermediate Canonical Representation • Expressive to incorporate all concepts in the local databases • Simple, intuitive, practical, etc. • Example: E/R model, relational model, graph/tree models, etc. • Tools CSCE 824 - Spring 2011
Schema Generation • Schema matching: syntax and semantics • Integration of common schema elements • Schema mapping • See example 4.1, 4.2 CSCE 824 - Spring 2011
Schema Matching • Defined or discovered (e.g., web data) • Rules: • Correspondence between 2 elements • Predicate whether the correspondence holds or not • Similarity value between the 2 elements CSCE 824 - Spring 2011
Finding Correspondence • Difficult process due to schema heterogeneity • Can be automated? • Insufficient schema and instance information • Unavailability of schema documentation • Subjectivity of matching CSCE 824 - Spring 2011
Matching Algorithm Issues • Schema vs. instance matching • Concept match • Data instance: semantic inconsistencies • Element-level vs. structure-level mapping • Element name semantics • Multiple attribute mapping? • Matching cardinality • One-to-one, one-to-many, many-to-many CSCE 824 - Spring 2011
Semantic Schema Heterogeneity • Semantic: meaning, interpretation, and intended use of data • Synonyms, homonyms, hypernyms • Different ontologies • Imprecise wording CSCE 824 - Spring 2011
Structural Schema Heterogeneity • Type conflict: attribute vs. entity • Dependency conflict: mapping cardinality inconsistencies • Key conflict: different primary keys • Behavioral conflict: modeling assumptions, e.g., referential integrity, deletion, etc. Farkas CSCE 824 - Spring 2011 22
Schema Integration • Binary • N-ary CSCE 824 - Spring 2011
Schema Mapping • How the data from local databases can be mapped to GCS • Mapping creating • Mapping maintanence CSCE 824 - Spring 2011
Mapping Creation • Input: LCS, GCS, M (schema matches) • Output: Q={Q1, …, Qk} such that • DBGCS = Q(DBCLS) CSCE 824 - Spring 2011
Security Objectives • Confidentiality • Integrity • Availability CSCE 824 - Spring 2011
Question 1 • How distributed databases impact the security objectives? • Confidentiality in traditional vs. distributed DBs • Integrity in traditional vs. distributed DBs • Availability in traditional vs. distributed DBs CSCE 824 - Spring 2011
Integrity • Correctness criteria • Top-down design • Bottom-up design CSCE 824 - Spring 2011
Availability • What are the issues related to availability when dealing with • Top-down design • Bottom-up design CSCE 824 - Spring 2011
Confidentiality • (will be covered in 2nd part of semester but…) • Centralized vs. distributed security policy • Top-down design • Bottom-up design CSCE 824 - Spring 2011
Next Class Semantics-based Database Integration CSCE 824 - Spring 2011