170 likes | 255 Vues
Learn about tools such as LOAD, UNLOAD, EXPORT, and IMPORT for bulk data operations. Understand ETL software, replication, messaging software, and working with distributed databases. Discover network considerations, distribution standards, access methods, and transaction protocols.
E N D
Chapter 18 Data movement & distribution
LOAD • Utility used to perform bulk inserts of data into database tables • Adds rows to a table • Replaces all existing rows with new data • Better to create all indexes before loading data into table • Plan for concurrent processing
UNLOAD • Utility that moves data or copies data • Reads data from a database and writes it to an output data vile • Can use image copy backups • Abilities • Use with LOAD
Unload continued • Data encoding scheme (EBCDIC, ASCII, UNICODE) • Floating –point data • Limiting UNLOAD data • Unloading from VIEWS
Test data • Both LOAD and UNLOAD can be used to create test data sets
EXPORT and IMPORT • EXPORT utility reads data from a table and places it into an external file • IMPORT reads an external file created by EXPORT and inserts the data into a table
ETL Software • Extract, transform, load • Used to populate data warehouses and data marts from other databases and other sources
Replication • Can use replication and propagation • Repeating stored data at other locations (replication) • Migration of changed data (propogation)
Messaging Software • Messaging software or message queueing software or application integration • Provides APIs to read and write formatted message to and from a queue
Distributed Databases • Distributed vs centralized • Autonomy • Isolation • transparency
Working with Networks • Minimize network traffic by retrieving only data needed • Use methods that encourage sending data over network in blocks (not rows) • Access local data instead of remote data
Distribution Standards • Two standards • DRDA – distributed relational database architecture (IBMs protocol) • RDA – remote database access (competing protocol developed by ISO and ANSI) • Gateways can also be used
Accessing Distributed Data • Need to help application developers match the DBMS capabilities • Remote request (single request to a single location) • Remote unit of work (single application access data from multiple location) – programmer must build units of work for each location
Access continued • Distributed unit of work (more than one DBMS can be accessed per unit of work, multiple SQL statements can read/write data at multiple servers within a single unit of work) • Distributed request (single SQL statement can access multiple data from more than one location – robust)
Two-phase Commit • When distributing units of work need to have protocol for committing transactions
Distributed Chain • Computer hardware, local OS, networking software and local databases for the requester • Network hardware, wiring, gateways, routers and hubs • Any middleware or transaction processing system used by the requester or the server
Chain continued • The computer hardware, local OS, networking software and databases on the server • Disk storage and storage management software