270 likes | 407 Vues
Designing a Data Exchange - Best Practices. Data Exchange Scenarios Sender vs. Receiver-initiated exchanges Node Design Best Practices: Handling Large Transactions State Management Data Services Data Validation Schema Design. Data Exchange Scenarios. Requesting Data (1 of 3).
E N D
Designing a Data Exchange - Best Practices • Data Exchange Scenarios • Sender vs. Receiver-initiated exchanges • Node Design • Best Practices: • Handling Large Transactions • State Management • Data Services • Data Validation • Schema Design
Requesting Data (1 of 3) Simple Query • Synchronous process • Ideal for small data sets • Ideal for both ad hoc and planned exchanges • Onus is on requestor to initiate exchange
Requesting Data (2 of 3) Solicit with Download • Asynchronous process • Good for larger datasets • Data Provider can schedule processing of request • Requester can use “GetStatus” to see if data is ready yet
Requesting Data (3 of 3) Solicit with Submit • Asynchronous process • Good for larger datasets • Does not require the requestor to continuously poll the data provider to see if data is ready
Sending Data (1 of 2) Simple Submit • Very simple and very common process • Typical for traditional regulatory flows • “Hides” data since is not exposed as a service
Sending Data (2 of 2) Notify with Download • Asynchronous approach to Simple Submit • Receiver can perform download at the time of their own choosing
Data Exchange Scenarios • Nodes wait for requests • Nodes may initiate actions (i.e. Submit) • How can a node do both?
Node Components Example Node Architecture
Node Components Node can be divided into components, each playing a different role: • The Web Services Interface • Acts as a listener for inbound requests and submissions • Hosted on a Web Server (i.e. IIS, WebSphere) • Should not do any heavy lifting (i.e. data processing)
Node Components (continued) • Request Processor • Performs all data processing • Composes XML files for outbound delivery • Decomposes and processes inbound XML files • Coupled with a scheduler component • Enables node to process Solicit requests at a time of the node administrator’s choosing • Automatically kick off outbound processes (i.e. daily Submit) • Flow agnostic • Decoupled from specific flow implementations • Ideally installed on an Application Server
Node Components (continued) • Node Administration Utility • Create and manage local accounts • Install new data exchange components • Set processing schedules • Audit Node activity • Extract documents (inbound and outbound should be stored)
Node Components (continued) • Flow-specific components • Discrete components tailored for a specific data exchange • Hot-swappable • Services (interface) is generic • Node configuration determines which services are internal or public • Node configuration determines whether a given service is for Query or Solicit
Node Components (continued) Flow-to-Node Interface
Large Transactions • Can cause problems in several areas: • Data retrieval (SQL) • XML serialization (sender side) • Transmission over Internet • XML deserialization (receiver side) • Schema validation (both sender and receiver)
Large Transactions • Stage data in a model similar to that which is used by the schema • XML is hierarchal whereas RDBMS is relational • More secure • source system unaffected by node operations • Index query parameter fields (SQL)
Large Transactions (continued) • Use an asynchronous exchange • Use Solicit, not Query • Schema design considerations • Schema KEY/KEYREF discouraged • Element naming may significantly affect file size <MailingAddressStateUSPSCode>OR</MailingAddressStateUSPSCode> • Query “costing” • Calculate the size of a given result set (i.e. COUNT(*)) before running full query. • Not very much experience in this area
Large Transactions (continued) • A well-designed flow can help avoid large transactions • “List” services can return only high-level data Scenario 1: • RCRA.GetFacilities(“WA”) Scenario 2: • RCRA.GetFacilityList(“WA”) • RCRA.GetFacilityDetail(“WA”,”FACID1234”) • Data service parameters can be used to limit transaction size Scenario 3: • RCRA.GetFacilitiesByType(“WA”,”LQG”) • All options affect schema design
Large Transactions (continued) • File compression • zipping files can reduce file size by over 90% • Compact storage (archiving) • Significant reduction in time to transmit • Disk I/O versus memory I/O • If possible, avoid using techniques which require system to read entire document into memory in order to process. Toughie…
State Management • State Management is required any time two systems must be synchronized • Contrast to Data Publishing exchange • Typically the sender’s burden, but does not have to be • Partial rejects compound the difficulty
State Management (continued) • Flagging source data • Set “submission status” indicator on source data • Complexity is directly related to transaction granularity • Compounded if record-level rejects are performed
State Management (continued) • Exchange Network Header • Same schema can be used to perform different transactions • Can remove the need for TransactionCode (i.e. INSERT, UPDATE, DELETE) in schema • “Delta” to derive data changes since last submit • Many systems do not store deleted data • Compare last submission snapshot with current snapshot, derive what has changed • Incremental and full refresh services • i.e. Facility Flow
Data Service Best Practices • Data service naming conventions {Prefix}.{Action}{Object}[By{Parameter(s)}] i.e.: FacID.GetFacilityByName • Work in Progress • What about versioning?
Data Services Best Practices Documenting data services: • Data Service name • Whether the service is supported by Query, Solicit, or both • Parameters • Parameter Name • Index (order) • Required/Optional • Minimum/Maximum allowed values • Data type (string, integer, Boolean, Date…) • Whether multiple values can be supplied to the parameter • Whether wildcard searches are supported and default wildcard behavior • Special formatting considerations • Access/Security settings • Return schema • Special fault conditions • Wildcards: % • Parameter delimiter: | (pipe character) • Parameter operation: AND
Data Validation Best Practices • XML instance files should be validated against the schema by the sender before submittal • CDX offering pre-submittal validation services for some flows • Schematron (Doug Timms)
Schema Design Best Practices • DRC 1.0 and DRC 1.1 • Schema Namespace • Schema Versioning • Exchange Network Schema Types • Use the Shared Schema Components