330 likes | 894 Vues
Fall VAMAP Conference November, 2004. SCHEV Biopsy of a Data Warehouse. Marina Moschos Wendell Pai Sumi Shahid. Outline. Phase 1: Building a data warehouse infrastructure. Phase 2: Designing and implementing a web-based collection system.
E N D
Fall VAMAP ConferenceNovember, 2004 SCHEV Biopsy of a Data Warehouse Marina Moschos Wendell Pai Sumi Shahid
Outline • Phase 1: Building a data warehouse infrastructure. • Phase 2: Designing and implementing a web-based collection system. • Phase 3: Providing data accessibility to end users.
Phase 1 • Goal 1: Set up databases and data warehouse structure • Goal 2: Port existing data from SAS system to SQL database • Focus on the needs of users and the flow of data
Phase 1 Building a Data Warehouse Infrastructure
User Needs • Greater convenience– dynamic reports, GUIs, online accessibility • Data suppliers need to retain access to all existing functions • Data end users need to retain access to all existing reports
Legacy System Data Migration • SAS data sets converted to SQL with data elements unchanged • SAS formats (data dictionary) converted to SQL Support Tables • Existing SAS programs converted to SQL views
Institutional Data • Web-based front-end interface • Each institution has a secure site on the webserver • Each institution has an SQL database in the SCHEV staging area • SCHEV has its own SQL administrative database to track file processing
Phase 2 Web-based Data Collection System
Phase 2 • Next phase in building SCHEV’s data warehouse • Allows the institutions via the web to - • submit unit record data • edit the data • create displays • lock data file
Phase 2 Upload
Upload • What occurs when an institution moves its data to the SCHEV system - • A Data Transformation Services (DTS) package is used to: • Create the target table for the particular data file the institution is uploading. • Load institution’s data file to SQL server. • Load institution’s data file into the created target table.
Upload DTS Package Design View • The collection of steps and tasks which define the order of upload execution and all of the details of the transformation.
Phase 2 Edits
Edits • Edit Process: • Recoded edits from SAS to SQL. • Stored edits in a separate table for each data file. • Built a DTS package for each data file to run edits against the institution’s file. • Created an Error Report table in every institution’s database in SQL to insert error/warning codes and values that were encountered by the data file edits. • Built an Edit Log table in every institution’s database in SQL to track whether the data file edits ran successfully or not. • Created an Error Summary web page for every institution that • summarizes the error information from the Error Report table.
Storing Edits • Each data file has its own edit table that stores the error code, edit, order value, field name, and report year.
EditsDTS Package Design View • Built DTS packages for each data file to run edits against the institution’s file.
EditsError Report Table • When the edits run against the institution’s file, information pertaining to errors or warnings in the data are inserted into an error report table.
EditsEdit Log Table • While each edit executes, information pertaining to the status of the edit run is inserted into an edit log table.
Phase 2 Displays
Displays • Display Process: • Transported display programs from SAS coding to SQL coding. • Created a table in the Institution Management database to store all the display programs for every data file. • Designed a DTS package for each data file to run the display programs that generate aggregated data. • Built individual stock display tables in every institution’s database to insert the aggregated data. • Created a Display Log table in every institution’s database to track whether a display program ran successfully or not. • Created web pages for each display so the institutions could view the aggregated data.
Storing Display Programs • This table was built in the institution management database to store all the display programs by data file and display name.
DisplayDTS Package Design View • Built DTS packages for each data file to run display programs.
Display Tables • A list of display tables is located in each institution’s database.
DisplaysDisplay Log Table • While the display programs run, information pertaining to each program’s success or failure is inserted into the display log table.
Phase 2 Locking
Locking File • Locking Process: • After an institution creates their data file displays, they can lock their file by pressing a ‘Lock File’ button. • An automated email is sent to SCHEV Research staff notifying us the institution has locked a data file for a given year. • Information pertaining to the institution’s locked data file (i.e. institution name, data file, report year) is inserted into a File Lock table. • SCHEV Research moves the institution’s data • file into a permanent unit record table, located • in the unit record database.
Fall VAMAP ConferenceNovember, 2004 SCHEV Biopsy of a Data Warehouse Marina Moschos Wendell Pai Sumi Shahid