1 / 31

Database Design Techniques for Clinical Research

Database Design Techniques for Clinical Research. Melissa K. Carroll, M.S. October 20, 2003. Overview. Relational database design Implementing a relational database in Microsoft Access Designing a database for a typical study by our group Resources Questions. Relational Database Design.

chakaluka
Télécharger la présentation

Database Design Techniques for Clinical Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Design Techniques for Clinical Research Melissa K. Carroll, M.S. October 20, 2003

  2. Overview • Relational database design • Implementing a relational database in Microsoft Access • Designing a database for a typical study by our group • Resources • Questions

  3. Relational Database Design

  4. What is a Database? • Collection of data organized for efficient operations • Everyone uses them multiple times a day, often without realizing it • Examples • Airline reservations • Online shopping • Underlying design principles are largely universal

  5. Problems with “Flat” Files • Data Redundancy • Leads to more work and inconsistencies • Wreaks havoc on performing basic manipulations such as searching and sorting • File Management • Multiple files • Concurrent users • Security • Intermediate results • Ad hoc programming (reinventing the wheel)

  6. Data Modeling: Entity-Relationship Model • Models data as entities, with attributes, and relationships between entities • Entity: person, place, or thing • Instance: example of an entity • Attribute: feature of an entity • Relationship: describes association between (usually two) entities

  7. E-R Notation

  8. Designing an E-R Diagram • Issues to Consider • What questions will the data be used to answer? • What are the entities and how do they relate to each other? • What attributes uniquely identify entities? • What attributes need to be sub-divided? • Goal: Eliminate Redundancy • Process is called “normalizing” data

  9. Sample E-R Diagram: MP3 Files

  10. Relational Model • Lower-level model used for actual database implementation • Translating from E-R model • Entities become tables • Attributes become fields • Many-to-many relationships become tables • Unique identifiers from involved tables as fields • Unique identifiers from “one” sides are added as fields to corresponding “many” sides

  11. Relational Database Management Systems (RDBMSs) • Database Management System (DBMS): software with purpose of helping user design and use a database • Relational Database Management System (RDBMS): DBMS for databases based on relational model • Most major commercial products (e.g. MS Access, Oracle, MySQL, SQL Server)

  12. SQL • Need language to tell the DBMS • The design of the database • Actual data to be entered • What data to retrieve and in what format • SQL = standardized language used by almost all major DBMSs • Standard language provides interoperability and portability

  13. SQL Examples • CREATE TABLE artist (artistID INT AUTO_INCREMENT, artistName VARCHAR(75)) • INSERT INTO artist (artistName) VALUES (“The Beatles”) • UPDATE album SET label = “EMI” WHERE albumTitle = “Abbey Road”

  14. SQL Examples Continued • SELECT songTitle, quality FROM song, recording WHERE song.songID = recording.songID • SELECT songTitle, quality FROM song INNER JOIN recording ON song.songID = recording.songID • SELECT albumTitle, albumAge AS releaseYear - Date() FROM album

  15. SQL Examples Continued • SELECT Count(artistID) from artist • SELECT MAX(recording.quality) FROM artist, recorded, recording WHERE artist.artistName = recorded.artistName and recorded.recordingID = recording.recordingID and artist.artistName = “The Beatles”

  16. Relational Database Implementation in Microsoft Access

  17. Clinical Research Database Design

  18. Typical Simple Study • Baseline and fixed number of follow-ups • Subject reaches each time point only once • Different time points have different scale protocols • Considerable overlap in scales between time points • Isolated from other studies

  19. Four Database Design Approaches • Approach One: entire assessment administration as entity, e.g. all of baseline or all of 12 week • One table per time point, items as attributes • Approach Two: scale administrations within each assessment as entity, e.g. 12 Week Hamilton • One table per scale per time point, items as attributes • Approach Three: scale administration as entity • One table per scale, items as attributes • Approach Four: item as entity • One table (theoretically)

  20. Evaluation of Approach One • May seem appropriate because common format for analysis is one record per subject • Problems • Limited number of fields allowed in some DBMSs • Will have many missing values • General redundancy issues (shares with Approach Two, to follow)

  21. Pros and Cons of Approach Two Versus Approach Three • Pros • “Horizontal” format • Flexibility for handling inter-time point scale disparities • Cons (for simple studies) • Data model complexity • Table creation and modification time multiplied • Space consumption • More data locations (entry and retrieval complexity) • Re-assigning to different time points

  22. Reassigning Scale Time Points Using Approaches Two and Three

  23. Reassigning Scales: Modified Approach Three

  24. Approach Two Cons for More Complex Studies • Poor at handling an indefinite number of follow-up time points • Modified Approach Three is better at handling studies in which subjects are assessed at the same time point multiple times • May happen due to progressing through the study multiple times • May also happen due to e.g. being screened multiple times

  25. Evaluation of Approach Four • Pros • Could potentially handle changes more elegantly • Perhaps more “normalized” theoretically • Cons • Considerably harder to design entry interface • Harder to obtain data in formats usually required • Doesn’t fix non-database problems with data collection changes

  26. Databases and Datasets • Database: Collection of data organized for efficient entry, updating, storage, and retrieval • Dataset: Subset of data retrieved from database in a format optimized for a specific reporting or analysis purpose • Well-designed databases should facilitate creation of datasets in any desired format • Datasets should be formatted for a particular purpose and used only for that purpose

  27. Normalizing Data Within Scales: Medication Data

  28. Normalizing Data Continued: Comparison of Medication Queries

  29. Multi-Study Issues: To Separate or Not To Separate • If same data will count for multiple studies • Keeping design and data in sync • E.g. updating all copies when data changed • E.g. ensuring scale changes are reflected in all tables and forms • If handling multiple, possibly “isolated” studies • Keeping design in sync • Can still use views so actual storage is transparent to user

  30. Summary • Careful planning must go into designing a database • First step in design is to model the data • E-R  relational model is effective • DBMSs, such as Access, offer tools for creating, using, and maintaining databases • When designing clinical research databases, as with any databases, priority should be normalization, hence elimination of redundancy • Properly designed databases will supply data in any format desired

  31. Resources • Access Help (Help in top menu, Contents and Index, Contents tab) • Access Database Wizard (in main menu upon opening) • Oreilly Access Database Design & Programming, 3rd Edition • For database design theory: online chapter at http://www.oreilly.com/catalog/accessdata3/chapter/ch04.html • Access (97/2000/etc.) Bible • Available here; not 100% accurate • Database System Concepts Fourth Edition (Silberschatz, Korth, Sudarshan)

More Related