1 / 26

Designing a Data Exchange - Best Practices

Designing a Data Exchange - Best Practices. Data Exchange Scenarios Sender vs. Receiver-initiated exchanges Node Design Best Practices: Handling Large Transactions State Management Data Services Data Validation Schema Design. Data Exchange Scenarios. Requesting Data (1 of 3).

galia
Télécharger la présentation

Designing a Data Exchange - Best Practices

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Designing a Data Exchange - Best Practices • Data Exchange Scenarios • Sender vs. Receiver-initiated exchanges • Node Design • Best Practices: • Handling Large Transactions • State Management • Data Services • Data Validation • Schema Design

  2. Data Exchange Scenarios

  3. Requesting Data (1 of 3) Simple Query • Synchronous process • Ideal for small data sets • Ideal for both ad hoc and planned exchanges • Onus is on requestor to initiate exchange

  4. Requesting Data (2 of 3) Solicit with Download • Asynchronous process • Good for larger datasets • Data Provider can schedule processing of request • Requester can use “GetStatus” to see if data is ready yet

  5. Requesting Data (3 of 3) Solicit with Submit • Asynchronous process • Good for larger datasets • Does not require the requestor to continuously poll the data provider to see if data is ready

  6. Sending Data (1 of 2) Simple Submit • Very simple and very common process • Typical for traditional regulatory flows • “Hides” data since is not exposed as a service

  7. Sending Data (2 of 2) Notify with Download • Asynchronous approach to Simple Submit • Receiver can perform download at the time of their own choosing

  8. Data Exchange Scenarios • Nodes wait for requests • Nodes may initiate actions (i.e. Submit) • How can a node do both?

  9. Node Components Example Node Architecture

  10. Node Components Node can be divided into components, each playing a different role: • The Web Services Interface • Acts as a listener for inbound requests and submissions • Hosted on a Web Server (i.e. IIS, WebSphere) • Should not do any heavy lifting (i.e. data processing)

  11. Node Components (continued) • Request Processor • Performs all data processing • Composes XML files for outbound delivery • Decomposes and processes inbound XML files • Coupled with a scheduler component • Enables node to process Solicit requests at a time of the node administrator’s choosing • Automatically kick off outbound processes (i.e. daily Submit) • Flow agnostic • Decoupled from specific flow implementations • Ideally installed on an Application Server

  12. Node Components (continued) • Node Administration Utility • Create and manage local accounts • Install new data exchange components • Set processing schedules • Audit Node activity • Extract documents (inbound and outbound should be stored)

  13. Node Components (continued) • Flow-specific components • Discrete components tailored for a specific data exchange • Hot-swappable • Services (interface) is generic • Node configuration determines which services are internal or public • Node configuration determines whether a given service is for Query or Solicit

  14. Node Components (continued) Flow-to-Node Interface

  15. Large Transactions • Can cause problems in several areas: • Data retrieval (SQL) • XML serialization (sender side) • Transmission over Internet • XML deserialization (receiver side) • Schema validation (both sender and receiver)

  16. Large Transactions • Stage data in a model similar to that which is used by the schema • XML is hierarchal whereas RDBMS is relational • More secure • source system unaffected by node operations • Index query parameter fields (SQL)

  17. Large Transactions (continued) • Use an asynchronous exchange • Use Solicit, not Query • Schema design considerations • Schema KEY/KEYREF discouraged • Element naming may significantly affect file size <MailingAddressStateUSPSCode>OR</MailingAddressStateUSPSCode> • Query “costing” • Calculate the size of a given result set (i.e. COUNT(*)) before running full query. • Not very much experience in this area

  18. Large Transactions (continued) • A well-designed flow can help avoid large transactions • “List” services can return only high-level data Scenario 1: • RCRA.GetFacilities(“WA”) Scenario 2: • RCRA.GetFacilityList(“WA”) • RCRA.GetFacilityDetail(“WA”,”FACID1234”) • Data service parameters can be used to limit transaction size Scenario 3: • RCRA.GetFacilitiesByType(“WA”,”LQG”) • All options affect schema design

  19. Large Transactions (continued) • File compression • zipping files can reduce file size by over 90% • Compact storage (archiving) • Significant reduction in time to transmit • Disk I/O versus memory I/O • If possible, avoid using techniques which require system to read entire document into memory in order to process. Toughie…

  20. State Management • State Management is required any time two systems must be synchronized • Contrast to Data Publishing exchange • Typically the sender’s burden, but does not have to be • Partial rejects compound the difficulty

  21. State Management (continued) • Flagging source data • Set “submission status” indicator on source data • Complexity is directly related to transaction granularity • Compounded if record-level rejects are performed

  22. State Management (continued) • Exchange Network Header • Same schema can be used to perform different transactions • Can remove the need for TransactionCode (i.e. INSERT, UPDATE, DELETE) in schema • “Delta” to derive data changes since last submit • Many systems do not store deleted data • Compare last submission snapshot with current snapshot, derive what has changed • Incremental and full refresh services • i.e. Facility Flow

  23. Data Service Best Practices • Data service naming conventions {Prefix}.{Action}{Object}[By{Parameter(s)}] i.e.: FacID.GetFacilityByName • Work in Progress • What about versioning?

  24. Data Services Best Practices Documenting data services: • Data Service name • Whether the service is supported by Query, Solicit, or both • Parameters • Parameter Name • Index (order) • Required/Optional • Minimum/Maximum allowed values • Data type (string, integer, Boolean, Date…) • Whether multiple values can be supplied to the parameter • Whether wildcard searches are supported and default wildcard behavior • Special formatting considerations • Access/Security settings • Return schema • Special fault conditions • Wildcards: % • Parameter delimiter: | (pipe character) • Parameter operation: AND

  25. Data Validation Best Practices • XML instance files should be validated against the schema by the sender before submittal • CDX offering pre-submittal validation services for some flows • Schematron (Doug Timms)

  26. Schema Design Best Practices • DRC 1.0 and DRC 1.1 • Schema Namespace • Schema Versioning • Exchange Network Schema Types • Use the Shared Schema Components

More Related