Designing a Data Exchange - Best Practices

Designing a Data Exchange - Best Practices • Data Exchange Scenarios • Sender vs. Receiver-initiated exchanges • Node Design • Best Practices: • Handling Large Transactions • State Management • Data Services • Data Validation • Schema Design

Data Exchange Scenarios

Requesting Data (1 of 3) Simple Query • Synchronous process • Ideal for small data sets • Ideal for both ad hoc and planned exchanges • Onus is on requestor to initiate exchange

Requesting Data (2 of 3) Solicit with Download • Asynchronous process • Good for larger datasets • Data Provider can schedule processing of request • Requester can use “GetStatus” to see if data is ready yet

Requesting Data (3 of 3) Solicit with Submit • Asynchronous process • Good for larger datasets • Does not require the requestor to continuously poll the data provider to see if data is ready

Sending Data (1 of 2) Simple Submit • Very simple and very common process • Typical for traditional regulatory flows • “Hides” data since is not exposed as a service

Sending Data (2 of 2) Notify with Download • Asynchronous approach to Simple Submit • Receiver can perform download at the time of their own choosing

Data Exchange Scenarios • Nodes wait for requests • Nodes may initiate actions (i.e. Submit) • How can a node do both?

Node Components Example Node Architecture

Node Components Node can be divided into components, each playing a different role: • The Web Services Interface • Acts as a listener for inbound requests and submissions • Hosted on a Web Server (i.e. IIS, WebSphere) • Should not do any heavy lifting (i.e. data processing)

Node Components (continued) • Request Processor • Performs all data processing • Composes XML files for outbound delivery • Decomposes and processes inbound XML files • Coupled with a scheduler component • Enables node to process Solicit requests at a time of the node administrator’s choosing • Automatically kick off outbound processes (i.e. daily Submit) • Flow agnostic • Decoupled from specific flow implementations • Ideally installed on an Application Server

Node Components (continued) • Node Administration Utility • Create and manage local accounts • Install new data exchange components • Set processing schedules • Audit Node activity • Extract documents (inbound and outbound should be stored)

Node Components (continued) • Flow-specific components • Discrete components tailored for a specific data exchange • Hot-swappable • Services (interface) is generic • Node configuration determines which services are internal or public • Node configuration determines whether a given service is for Query or Solicit

Node Components (continued) Flow-to-Node Interface

Large Transactions • Can cause problems in several areas: • Data retrieval (SQL) • XML serialization (sender side) • Transmission over Internet • XML deserialization (receiver side) • Schema validation (both sender and receiver)

Large Transactions • Stage data in a model similar to that which is used by the schema • XML is hierarchal whereas RDBMS is relational • More secure • source system unaffected by node operations • Index query parameter fields (SQL)

Large Transactions (continued) • Use an asynchronous exchange • Use Solicit, not Query • Schema design considerations • Schema KEY/KEYREF discouraged • Element naming may significantly affect file size <MailingAddressStateUSPSCode>OR</MailingAddressStateUSPSCode> • Query “costing” • Calculate the size of a given result set (i.e. COUNT(*)) before running full query. • Not very much experience in this area

Large Transactions (continued) • A well-designed flow can help avoid large transactions • “List” services can return only high-level data Scenario 1: • RCRA.GetFacilities(“WA”) Scenario 2: • RCRA.GetFacilityList(“WA”) • RCRA.GetFacilityDetail(“WA”,”FACID1234”) • Data service parameters can be used to limit transaction size Scenario 3: • RCRA.GetFacilitiesByType(“WA”,”LQG”) • All options affect schema design

Large Transactions (continued) • File compression • zipping files can reduce file size by over 90% • Compact storage (archiving) • Significant reduction in time to transmit • Disk I/O versus memory I/O • If possible, avoid using techniques which require system to read entire document into memory in order to process. Toughie…

State Management • State Management is required any time two systems must be synchronized • Contrast to Data Publishing exchange • Typically the sender’s burden, but does not have to be • Partial rejects compound the difficulty

State Management (continued) • Flagging source data • Set “submission status” indicator on source data • Complexity is directly related to transaction granularity • Compounded if record-level rejects are performed

State Management (continued) • Exchange Network Header • Same schema can be used to perform different transactions • Can remove the need for TransactionCode (i.e. INSERT, UPDATE, DELETE) in schema • “Delta” to derive data changes since last submit • Many systems do not store deleted data • Compare last submission snapshot with current snapshot, derive what has changed • Incremental and full refresh services • i.e. Facility Flow

Data Service Best Practices • Data service naming conventions {Prefix}.{Action}{Object}[By{Parameter(s)}] i.e.: FacID.GetFacilityByName • Work in Progress • What about versioning?

Data Services Best Practices Documenting data services: • Data Service name • Whether the service is supported by Query, Solicit, or both • Parameters • Parameter Name • Index (order) • Required/Optional • Minimum/Maximum allowed values • Data type (string, integer, Boolean, Date…) • Whether multiple values can be supplied to the parameter • Whether wildcard searches are supported and default wildcard behavior • Special formatting considerations • Access/Security settings • Return schema • Special fault conditions • Wildcards: % • Parameter delimiter: | (pipe character) • Parameter operation: AND

Data Validation Best Practices • XML instance files should be validated against the schema by the sender before submittal • CDX offering pre-submittal validation services for some flows • Schematron (Doug Timms)

Schema Design Best Practices • DRC 1.0 and DRC 1.1 • Schema Namespace • Schema Versioning • Exchange Network Schema Types • Use the Shared Schema Components

Designing a Data Exchange - Best Practices

Designing a Data Exchange - Best Practices

Presentation Transcript

Design a Data Center: Best Practices for Designing a Data Ce

Microsoft Exchange Server Best Practices Analyzer Tool

Designing a Data Warehouse

Data Mining – Best Practices

Designing a Data Warehouse

Data Manager Best Practices

Best Practices for Interoperable Data Exchange Using LOINC

Data Management Best Practices

Data Center Best Practices

Data Exchange Design and Implementation Best Practices

Microsoft Exchange Server Best Practices Analyzer Tool

Best Practices for Designing Faceted Search Filters

Data Management Best Practices

Best Practices for Virtualizing Exchange Server 2010

Best Practices Outlook and Exchange

2013 Best Practices Exchange

Best Practices For A Website Designing

Designing Data Exchange Experiments Using Architectural Techniques

Data Mining – Best Practices

Best Practices Exchange 2006

Designing Websites For Kids: Trends & Best Practices