1 / 48

Database Design for Cyber Infrastructure

Database Design for Cyber Infrastructure. August 2004 Dave Archbell dave @ sdsc.edu Advanced Database Projects http://daks.sdsc.edu http://www.sdsc.edu dbd4ci() { ## begin. Database Design (large or complex). General Design Considerations Data Modeling / Application Workflow

mayes
Télécharger la présentation

Database Design for Cyber Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Design for Cyber Infrastructure August 2004 Dave Archbell dave @ sdsc.edu Advanced Database Projects http://daks.sdsc.edu http://www.sdsc.edu dbd4ci() { ## begin

  2. Database Design (large or complex) • General Design Considerations • Data Modeling / Application Workflow • Data Normalization • Data Integration • Performance • Tutorials (2) • Questions

  3. General Design Considerations • Users • Legacy Systems/Data • Data Sources/Formats • Application Requirements • Questions!?

  4. Users (who are they?) • Scientific • Administrative • Technical • Lay-person • Student

  5. Understanding who theusers are can impact • Access Controls • Service Levels • Interfaces • Character Sets • Languages • Language Styles

  6. Legacy Systems/Data • What systems are currently in place? • Where does the data come from and how is the data generated? • What does the data look like (format)? • What is it used for? • What of these is REQUIRED to remain consistent or static?

  7. $$$ $$$ Reliability Reliability Functionality Functionality Performance Performance Application Requirements • Budget • Functional requirements (what must it do?) • Environment / Platform • Mandated Technologies (Politics) • Uptime

  8. $$$ Reliability Functionality Performance Politics $(n) Budget

  9. Application Workflow • Order of operations • Accounting • Auditing • Error handling • Logic • Reporting

  10. Application Logic – Gotcha Where should the logic live? • In the application • In the database • Shared responsibility

  11. Where should the logic live? • Is the application stand-alone or a collection of applications? • Where are the users defined? • Is data restricted by user, group, or role? • Will the application be transactional (OLTP)? • Is auditing of transactions required? • Is THIS the only database used by the application? • Do the application/business rules apply to the data specifically?

  12. Data Modeling • What will the data need to look like in order to support required functionality? • Can the application support multiple data formats which are dependent on how it is used? • Conceptual • Logical • Physical • CASE Tools

  13. Conceptual Model • The big picture • Describes concepts • Provides platform for designing the Logical Model

  14. Logical Model • Describes the entities and attributes to be used in storing data • Describes relationships in a generalized way that does not reflect the physical characteristics of any particular back-end database  • May reference diverse databases to be treated as a collection

  15. Physical Model Assuming SQL • Specific to the database vendor • The actual DDL which creates the physical structures where the data is stored • Describes storage characteristics • Constrains the data

  16. Vendor(s) Selection • What are the users currently using and familiar with? • What functionality is required? • Where will the application logic live? • How many users need to be supported (today and tomorrow)? • What other software needs to integrate with the back-end? (Globus?) • What is our budget?

  17. Vendor(s) Selection (cont.) • MySQL • PostgreSQL • MS SQL • DB2 • Oracle • Sybase • Informix • FirstSQL

  18. Vendor(s) Selection (cont.) • Does we need to use open-source? • Do we need transactions? • Do we need stored procedures? • Do we need roles or groups?  • Will we need to operate in a GRID environment? • Do we need the ability to federate? • Do we have a budget for purchasing?

  19. Data Normalization Latin norma – used for making right angles • Analyzing inherent relationships between data • Determine which form is appropriate for your data • Reduces and/or eliminates redundant data • Reduces and/or eliminates data integrity errors

  20. First Normal Form • Break data into tables (from entities) • This should be the smallest meaningful value • No repeating groups • One separate table for each group of data • Identify and define a primary key for each table [A primary-key is the set of columns (attributes) which uniquely identify a row of data in the table]

  21. Second Normal Form • Removes data columns (attributes) that rely only on part of the primary key • New tables for data that is to more than one record in any single table • Use Foreign-Key references to relate tables to one another [A foreign-key is a column (attribute) which relates to the primary key of another table]

  22. Third Normal Form • Removes any data or column (attribute) which does not rely solely on the primary key for each table • Requires that primary/foreign key relationships be defined and utilized for applications and reports

  23. Fourth Normal Form • Fourth Normal Form requires only that there be no multi-valued dependencies • It is rarely used • One should be careful to ensure that the application has no requirement for multi-valued dependencies before considering this form

  24. Data Integration Federation

  25. Federation • Providing a common interface to disparate data sources • Sources are generally related (ie: BIO, GEO, etc.) • Sources can be autonomous • Discovery. Provides a framework to build applications which can identify new relationships and take advantage of them

  26. PDB EOL AfCS Federated Sources (BIO)

  27. PDB EOL AfCS Federation Federation with Memory Caching FEDERATION with Cache Cache in Memory FEDERATION

  28. Federation with Memory Caching • Twice as fast with no other changes • Network latency not an issue • Performance and/or reliability of remote data source not an issue • Synchronization - automatic • Joins are local, so no data movement across the wire • Joins are performed within the same instance providing for better use of indexes

  29. Mediation Define domain-specific data in standardized terms to allow for relating it to other data • Data Modeling • Knowledge Representation • Query Processing for model-based mediation

  30. Mapping: Science Domains in DICE DATA Grid Data Svcs Federation Mining VIZ Mediation Analysis Astronomy  Biology       Chemistry  Ecology  Engineering  Geology  Medical  Neurology  Particle Physics  Zoology     

  31. Mapping: Science Domains in DICE DATA Grid Data Svcs Federation Mining VIZ Mediation Analysis Astronomy       Biology       Chemistry       Ecology       Engineering       Geology       Medical       Neurology       Particle Physics       Zoology      

  32. Performance • Hardware • Add memory • Faster Disks / Controllers • Disk Arrays (RAID-10 in PLAID) • NAS –vs– SAN • Indexes • Increase Buffers (memory usage) • Update Statistics • Hints (in query) • Canned and Optimized Queries • Platform (32-bit, 64-bit) • Cache Data • …. uh, de-normalize?!

  33. Performance (the easy way) Throw big hardware at the problem • Doesn’t scale • Costly • May require specialized skill to administer • Works great as a stop-gap

  34. Real-World Examples • BIRN http://www.nbirn.net • PDB http://www.rcsb.org/pdb • WIISARD http://health.ucsd.edu/news/2003/10_23_WIISARD.html

  35. Transportation assets Hospital #1 Attack site Warm zone Prevailing wind Compromised transportation corridor Hot zone Decon Transport station Field Treatment Station Triage X DisasterResponse Team Incident command center Hospital #2 (ER overloaded) _

  36. Mobile CDMA-2000 Internet Hospital #1 WirelessLAN connected to Internet Hot zone Delivery of field treatment data to Hospitals Electronic record of field care Transport station Field Treatment Station Triage X Incident command center Patient monitoring And location tracking Hospital #2 _

  37. Questions?

  38. Tutorials • Normalizing • A simple application

  39. Tutorial - Normalizing Event Registration System • Folks • Contact Information • Event • Location • EXTRA Information • Attendees • Start Time/Date • End Time/Date • Description

  40. Folks

  41. Model

  42. Tutorial – Simple Application Event Registration

  43. Event Registration

  44. Event Registration - Extensions

  45. Continuing Education • UCSD Extension Data Management Courses DBA Certificate Program Database Application Developer Certificate Program

  46. Books and Reference • Database Design for Mere Mortals, Michael J. Hernandez • Information Modeling and Relational Databases, Terry Halpin • IBM RedBooks and RedPapers http://www.redbooks.ibm.com/ • Oracle Documentation http://www.oracle.com/technology/documentation/database10g.html

  47. Database Design for Cyber Infrastructure } ## end dbd4ci

More Related