1 / 131

Information Management CSC824

Information Management CSC824. Part 2 Nick Rossiter b.n.rossiter@ncl.ac.uk. Interoperability in Information Systems. Interoperability 1. Interoperability: the ability to request and receive services between various systems and use their functionality. More than data exchange.

xanti
Télécharger la présentation

Information Management CSC824

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Management CSC824 Part 2 Nick Rossiter b.n.rossiter@ncl.ac.uk

  2. Interoperability in Information Systems

  3. Interoperability 1 • Interoperability: the ability to request and receive services between various systems and use their functionality. • More than data exchange. • Implies a close integration

  4. Interoperability 2 • Features: • exchange of messages and requests • use of each other’s functionality • client-server abilities • distribution • operate multiple systems as single unit • communication despite incompatibilities • extensibility and evolution

  5. Motivations 1 • Diversity of modelling techniques • Distributed businesses may exercise local autonomy in platforms • Data warehousing requires heterogeneous systems to be connected • Data mining enables new rules to be derived from heterogeneous collections

  6. Motivations 2 • Pervasive Computing : networks supporting many diverse nodes to be driven by users specifying policy and function. • Policy: statements governing how a solution will be achieved. Statements are derived from requirements • Function: mechanism for achieving objectives. • Mobile Computing: wireless networks

  7. Motivations 3 • E-Science • Distribution of functionality transparently across many different platforms • Grid • Layered components available in network: • Computational, information and knowledge layers and probably more.

  8. Basic Definitions 1 • Distribution: information bases are stored on multiple computer systems interconnected by a communication medium. • Homogeneous system: one that adheres to the same software at all sites. • Heterogeneous system: one that does not adhere to the same software at all sites.

  9. Basic Definitions 2 • Autonomy: the ability of a site to control its own activities with respect to one or more of: • design • communication • execution • association

  10. Basic Definitions 3 • Model: a representation of policies in a structured form according to some perceived view of reality e.g. • Relational model – world is tabular • Hierarchical model – world is tree-like • Security model – world is task-based • Object model – world is based on o-o paradigm

  11. Basic Definitions 4 • Mechanism – how a particular model or policy is to be realised (implemented). The design of a system. • Implementation – the coding and compilation of a system. • ‘Instantiation’: • populating a system with data. • executing a program.

  12. Semantic Problems in Interoperability 1 • People call different properties by different names • People classify properties differently: • Different contexts e.g. colour is property in describing a car but a table to the paint shop. • Different normalization priorities e.g. many tables optimised for updates versus few tables optimised for searching.

  13. Semantic Problems in Interoperability 2 • People make use of facilities in different ways. For instance: • In SQL-92 can achieve uniqueness in tables by: Defining keys Modifying table storage method on various properties Defining a unique index - So many legacy problems

  14. Further Legacy Problems • May ostensibly have systems with relational model, but may vary between: • SQL-89, SQL-92, SQL-1999. • Foreign key -- Primary Key for association: • 1st class definition only in SQL-92, SQL-1999 • Inheritance -- UDT: • 1st class definition only in SQL-1999

  15. Constraints and Types • May differ between systems: • e.g. student ids may be held as: • integers (leading zeros removed) 65275 • integers (padded out with so many leading zeros) 0065275 • strings (fixed length) ‘0065275’ • Ids may have checksum function or not

  16. Semantic Problems in Interoperability 3 • Structural problems are bad enough. But also: • Functionality can be applied in many different ways: • Procedures or functions; • different module layout. • Rules can be in: • Model structures, model coding, procedures or application programs.

  17. Relational Model Definitions Relational Table Definitions Format of command (upper-case entered literally, lower-case to be substituted by user, [..] indicates optional) is: CREATE TABLE rt (a1 type [nn], a2 type [nn], ..., an type [nn], PRIMARY KEY (ak, al, ...), {[FOREIGN KEY (af, ag, ...) REFERENCES rx, ...] }) where r is table (relation) name a1 ... an are attribute names type  {INT, REAL, MONEY, DATE, CHAR(p)} (plus few others in some systems) n is degree of table p is length (fixed) of character field nn = 'NOT NULL' KEYS give uniqueness and reference points.

  18. Relational Definition CREATE TABLE EXAM ( Module_no char(6), Student_id char(10), Date_Exam date, Mark int, PRIMARY KEY (module_no, student_id, date_exam), FOREIGN KEY (module_no) REFERENCES Modules, FOREIGN KEY (student_id) REFERENCES Students )

  19. Abstractions • Attribute is a property (classification abstraction) • Table name is aggregation (of properties) • Foreign Key is an association (relationship)

  20. Typing • Of attributes by simple means (integer, float, …) • Of primary key attributes by uniqueness (can only be picked once from domain) • Of foreign key attributes by occurrence in another table as cross-reference

  21. Simple Problem in Interoperability 1 • Two schemas in SQL-1999 AB author char(50) author_surname char(50) author, initials char(10) title varchar(300) title varchar(200) keyword set(char(30)) keywd array(8) (char(30)) Note: homogeneous model -- both SQL-1999 -- but difficulties.

  22. Different Standards • For example -- Names: • Person(surname, first_name, ..) • or Person(first_name, surname, …) • or Person(name, …) • First two may easily be made equivalent but convention in third needs to be understood. • Note also possibilities of A.N.Other, AN Other, A N Other.

  23. Possible Solutions • In schema B define function which amalgamates the two parts of author into one value. • Will need to look manually at format of author in schema A. • If format inconsistent, near some pre-processing. • Other inconsistencies require decisions: • variable set versus array dimension 8. • Different name for keyword attribute • different size for title fields (presumably adopt higher). • In heterogeneous environment, need also to relate schema constructions. Is class same as table?

  24. Simple Problem in Interoperability 2 • Homogeneous Models • the same information may be held as attribute name, relation name or a value in different databases • e.g. fines in library; • could be held in a dedicated relation Fine(amount, borrowed_id) • or as an attribute Loan(id, isbn, date_out, fine) • or as a value Charge(1.25, ‘fine’)

  25. Object-oriented Databases Modelling and Abstractions

  26. O-O DB Starting Point • Persistent Programming Languages with: • programming paradigm • complex abstractions • manipulations of general data structures • theoretical basis less obvious • complex user interface • functional completeness

  27. Relational Starting Point • Relational Data Model with: • data models • relational structuring • manipulations of relations • strong set-based theoretical basis • simple user interface • limited functionality

  28. Evolutionary Pressures • Same user pressures in data handling requirements apply to both so resulting enhancements/softening lead to a number of similarities in end-products: • Users want: • Complex abstractions • Complete functionality • Ease of use • Reliability (hence provability -- hence theory)

  29. Thrust • So thrust is to provide database systems with: • Underlying complex structures • Powerful manipulation mechanisms AND • Declarative manipulation languages

  30. Ideal OODBMS Properties • Main drive came in 1980s. • Ideal properties of OODBMS: • object-oriented (programming) system • persistence for (some) objects • fast retrieval of persistent objects • concurrency (transactions) • high-level (declarative) query language

  31. Alternative Approaches 1 • Adapt imperative: take an imperative programming language and add library extensions through embedded techniques. All structures and database functionality are defined in this way. So more extensive add-ons for database functionality than in embedded SQL. • (Example O2 -- extensions to C).

  32. Alternative Approaches 2 • Adapt o-o: take an o-o language and add library facilities for additional classes to provide persistence, aggregation, .. (Examples: Ontos, Versant, ObjectStore). Note -- not quite like embedded SQL as that defines only extra functionality not structures as well.

  33. Alternative Approaches 3 • Evolve: take an o-o language and add features 2-5 above directly into the language; that is, extend the language with database 'extras' as first-class facilities. • Example: GemStone which extends Smalltalk, Java, C++

  34. Alternative Approaches 4 • Revolutionise: start from scratch and develop an o-o database system with required facilities, independent of existing programming languages. Based on object and semantic models. • Example SIM -- Semantic Information Manager

  35. Alternative Approaches 5 • Adapt Relational • e.g. Object-relational model • SQL-1999 • Start with SQL92 and introduce: • User-defined types (UDT) • Inheritance (sub-types) • Complex objects, references

  36. An Example Object-oriented Database System Objectivity/C++

  37. Overview • From Objectivity, inc. • Available for unix, VMS, Windows • Supports C++, Java, SmallTalk   • Classification: object-oriented database system derived from C++ by making objects persistent; SQL-like language provided for declarative interface. Pre-compiler. • Approach 2 (adapt o-o). • Newcastle University (UCS) had this system (unix) on trial on Aidan. Used in CSC313 in 1999.

  38. Object Lifetimes by Class Either: • Persistent-capable: • whose objects may have a lifetime greater than that of the programs which create them. • Non-persistent capable: • whose objects cannot be made persistent directly but can be made persistent in the federated database as, for instance, data member-types. • Transient: • whose objects have a lifetime no greater than that of the programs which creates them.

  39. Federated Database • The federated database is the basic unit, holding potentially many databases defined by many schema. • Object identifiers are unique within the federated database.

  40. Objectivity File Structures for Persistent Data Federated DB 1:N Database (D) Container (C ) Basic Objects, held in slot addresses (S) on pages (P)

  41. Addressing - Federated Database • Schema for federation: • Catalogue of databases in federation + their export schemas • Database: • held in 1+containers – complete schemas • Container: • physical layout in terms of pages allocated • Pages: • Unit of storage for disk fetches and stores • Objects: • persistent objects with object identifiers (OIDs)

  42. Object Identifiers • OIDs: addresses within a page (slot number) • OIDs: addresses D-C-P-S (database-container-page-slot) • Total 64 bits (16 bits per level) • e.g. • 03-05-26-32 • addresses object in slot 32 of page 26 in container 05 in database 03.

  43. Object-oriented DBMS Objectivity (continued)

  44. Federated Database -- Addressing • Schema for federation • Catalogue of databases in federation + their schemas • Database: held in 1+containers (D) • Container: determines physical layout (C) • Objects -- persistent held on pages (P) • OIDs: addresses D-C-P-S (S is slot number) • Total 64 bits (16 bits per level)

  45. Persistent Capable Classes • Define a class and inherit persistent properties from a predefined Objectivity/C++ class ooObj • Example: • class employee : public ooObj • // inherits persistence from ooObj • class manager : public employee • //inherits properties, functions and persistence from employee

  46. Persistent-capable classes • Create a data definition file (.ddl) for each such class. • If the classes already exist in non-database environment as .h files, then simply change extension to .ddl for use in Objectivity/C++.

  47. DDL Processor • Takes as input a .ddl file and outputs: • .A header file (.h) -- the original file with added Objectivity member functions for storing, retrieving and modifying objects. • A secondary header file (_ref.h) -- ooRef for object reference declarations -- included also as part of (.h) -- may be needed explicitly for forward declarations (boot-strap problem). • .A C++ implementation file -- (_ddl.c) for unix -- implements in C++ Objectivity member functions declared in the header file; result is to be later compiled and linked with C++ application.

  48. Setting up a Database: Unix DDL Processor • ls employee.ddl • oonewfd -fdfilepath company.FDB -lockserverhost machine95 company • ls company employee.ddl company.FDB • ooddlx employee.ddl company • ls company company.FDB employee.ddl employee.h employee_ref.h employee_ddl.c

  49. Notes on DDL processor • oonewfd -- tool -- sets up boot file and registers database • ooddlx -- DDL processor • machine95 -- lock server machine • company is boot (start-up) file • company.FDB is federated database file • employee.ddl is original (input) class definition • employee.h is output (enhanced) class definition • employee_ref.h is output reference class • employee_ddl.c is C++ implementation file

  50. Similarity to Embedded SQL • Some similarities to Ingres/ESQL but database is held in your disk space with Objectivity.

More Related