1 / 21

Database Engines for Bioscience

Database Engines for Bioscience. John Corwin Avi Silberschatz Swathi Yadlapalli Yale University. Database Engines for Bioscience. The Yale Center for Medical Informatics, led by Perry Miller, makes extensive use of biomedical and bioscience databases. Large database projects include

misha
Télécharger la présentation

Database Engines for Bioscience

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Engines for Bioscience John Corwin Avi Silberschatz Swathi Yadlapalli Yale University

  2. Database Engines for Bioscience • The Yale Center for Medical Informatics, led by Perry Miller, makes extensive use of biomedical and bioscience databases. • Large database projects include • Trial/DB • SenseLab

  3. Trial/DB • Trial/DB is an open-source clinical study data management system • Manages patient data across multiple studies. • No limit on the number of patients per study. • No limit on the number of parameters that are tracked in each study. • Currently used for a large number of clinical studies at Yale and other universities.

  4. SenseLab • Stores integrated, multidisciplinary models of neurons and neural systems. • Cell properties (receptors, currents, and transmitters) • Neurons • Networks of neurons (computational neuron models)

  5. Database Engines for Bioscience • What’s different about these databases? • Sparse data • Frequent schema changes • Row-level security policies • Lots of metadata

  6. EAV • Entity-Attribute-Value (EAV) • Also known as Object-Attribute-Value • Data is stored in 3-column tables • Column 1 stores the object name • Column 2 stores the attribute name • Column 3 stores the attribute value

  7. EAV – Example • Suppose we’re building a database of fruit types

  8. EAV/CR • EAV/CR: extends EAV with classes and relationships • Classes – the value field in each EAV triple may contain complex, structured data • Relationships – values may refer to other objects in the database, thus relationships in the database are stored explicitly

  9. Drawbacks of EAV and EAV/CR • Since all data is conceptually in a single table, the only operations we can perform are filters and self-joins • Support for regular queries must be re-implemented on top of the EAV database • The performance of attribute-centered queries is significantly worse than a conventional database • High storage overhead for dense data

  10. Our work • Extend a conventional database engine to better support the requirements of bioscience databases. • Take advantage of existing tools • Better performance • Better space-efficiency

  11. Schema modification • Use 1-column tables • Each attribute is stored in an individual table • The original table is formed by joining the 1-column tables by tuple index

  12. Schema modification • Adding and Removing attributes • One-column tables can be added and removed without affecting the existing tables

  13. Schema modification • Query • Original table is formed by joining each one-column table by tuple index

  14. Schema modification • To test this new storage mechanism, we implemented it in PostgreSQL • PostgreSQL is an open-source relational database engine • Based on the original Postgres engine developed at Berkeley • Supports modern database features such as complex queries, transactional integrity, and extensible functions and data types • 500,000 lines of C code

  15. Dynamic Tables: Interface • Create an extension of the SQL syntax • All other table operations work normally on dynamic tables – the implementation of dynamic tables is transparent to the database user CREATE DYNAMIC TABLE fruit_table ( fruit string, avgWeight float, potassium int);

  16. Dynamic Tables: Implementation • Original statement: CREATE DYNAMIC TABLE t(c1 t1, c2 t2, ..., cn tn); • Create individual tables, for 1 ≤ i ≤ n: CREATE TABLE dyn_t_i(ci ti); • Expose original table as a view: CREATE VIEW t AS (SELECT c1, c2, ..., cn FROM dyn_t_1, dyn_t_2, ..., dyn_t_n WHERE (c1.ctid = c2.ctid) AND ... AND (cn-1.ctid = cn.ctid));

  17. Dynamic Tables: Implementation • Override implementation of insert, update, and delete • Insert: Given the query INSERT INTO t VALUES (v1, ..., vn); Translates to INSERT INTO dyn_t_i VALUES (v_i); for 1 ≤ i ≤ n

  18. Dynamic Tables: Implementation • Update and Delete: take advantage of PostgreSQL’s rule system to translate operations to individual tables • Must maintain the invariant that each table has the same number of entries

  19. Sparse Data • Make the column index explicit • Change our one-column tables to two-column tables • Column 1: row-index of this tuple • Column 2: data value • Table is exposed by joining sub-tables by explicit row index • Dense and sparse attributes can be mixed within the same relation

  20. Database Engines for Bioscience • Future work: • Row-level security • Metadata • Implement SenseLab database and compare performance

  21. Database Engines for Bioscience • Questions?

More Related