160 likes | 287 Vues
This presentation delivered by Norbert Kremer focuses on the fundamental aspects of database design specifically applied to molecular biology and bioinformatics. Key features such as efficient data storage and retrieval, data integrity, security, and performance are discussed. The session covers various database technologies from manual systems to advanced automated systems, including relational and object-oriented databases. The importance of software engineering methodology, particularly in enforcing data integrity through ACID properties, is highlighted, along with modeling techniques for effective data management.
E N D
Database Design Presented to Molecular Biology Bioinformatics Meeting by Norbert Kremer April 29, 2002
Database Features • Efficient storage and retrieval of data relevant to a specific purpose • Data integrity • Security • Performance, indexing • Best design depends on application
Database Technologies • Manual Systems • Shoebox full of index cards • Yellow stickies • Mechanical Systems • Hollerith Cards • Indexing holes
Database Technologies • Automated Systems • Flat file, fields in fixed or delimited columns • Early database systems: network, hierarchical • Client-server paradigm: relational database management systems • Object databases, e.g. AceDB • Object-relational, e.g. Oracle 9 • XML, eg Neocore and Tamino
RDBMS Features 1/2 • Data stored as tables with keys • Sound mathematical foundations in set theory, T.Codd, IBM Research Labs • Relational algebra – closure • Best for highly structured data
RDBMS Features 2/2 • SQL is most common relational language • SQL is non-procedural, relationally complete • Robust, mature technology, performance and scalability, security, data integrity
Relational Table Structure • Examples showing • One big table • Multiple tables with keys • Normalization Process • Data elements depend on the “key, the whole key, and nothing but the key”
Data Modeling • Identify and define data elements • Discover logical relationships between data elements
Data Integrity • Constraints enforced by DBMS • Transaction processing
Data Integrity - Constraints • Constraints enforced by DBMS • Central location eases maintenance • Uniform application in all code • Cannot be subverted by malicious or lazy programming • Foreign keys
Data Integrity - Transactions • ACID Properties • Atomic • Consistent • Isolated • Durable
Data Integrity - Transactions • ACID Properties • Atomic Logical unit of work, complete all or roll back • Consistent All constraints observed • Isolated Transactions are independent • Durable “Commit” is irreversible
PowerDesigner Demo • Graphical communication among analysts, architects, developers, end users • Centralized maintenance of database schema and data dictionary • Forward engineering (SQL script from model) • Reverse engineering (model from existing database) • Medline example, showing modeling process for “associative tables”
References • WinSQL graphical database client for any ODBC-compliant database (just about all) Get the free Lite version from http://www.indus-soft.com/winsql/ • ODBC driver for MySQL (also need the usual MySQL client installed) http://www.mysql.com/downloads/api-myodbc-3.51.html • Powerdesigner 9 Trial The trial is fully functional for 45 days. See http://www.sybase.com/products/enterprisemodeling/powerdesigner