FaceBase Data Management

FaceBase Data Management

Data Management Goals • Schema-less data format • Fast search • Fast update • Scalability • Transactions* • Atomicity • Consistency • Isolation • Durability

Having it all… • No one system has everything • Brewer’s theorem (CAP) • Consistency • Availability • Partition Tolerance • What can we compromise on? • What can we sacrifice?

MongoDB • Document store • BSON • Search: Fast, but indexes have some restrictions • Prefixing – no true multi-dimensional indexes • Update: Fast, but indexes have to explicitly declared • Scalabilty • Designed to scale out and shard • Strong consistency and eventual consistency

Solr • Document store • Each document has several key-values, no hierarchy • Web API • Search • Fast keyword and range searches • Mulit-attribute search? • Update • No updates searchable until “commit” • “commit” is time-consuming – rebuild the index • Scalability • Similar to “commit”, replication requires creating a complete image

Cassandra • Document store • Indexed on primary key – key-value model • Fast single-key equality search • More complex queries not as efficient • Fast updates • Scalability • Designed for distributed environment • Emphasis of high availability, partition tolerance • Brewer’s Conjecture: consistency is compromised

MySQL • Relational model • Build a document model over relational model • Fast search • Fast update • Performance concerns with large data set • Scalability • Replication, sharding • True transactions • Easy to generate sequence numbers • Transactions can incorporate Drupal and FB data w/o coupling

Head to Head

Possible Routes • Hybrid MongoDB and MySQL • Benefits: Less resource demand • Drawback: Recovery and durability • Keep metadata in MongoDB, use MySQL as a catalogue • In-house EAV with MySQL • Benefits: Better interoperability • Drawback: Higher resource demand • Implement EAV store over MySQL.

FaceBase Data Management

FaceBase Data Management

Presentation Transcript

Data Management

Data Management

Data Management

Data Management

Data Management

Data Management

Sharing data from non-FaceBase projects through FaceBase

Data Management

Timeline for Data-sharing for FaceBase Projects: General Principles

Data Management

DATA MANAGEMENT

Data Management

FaceBase Biorepository: Overview

Data Management

FaceBase

Data management

Data Management

FaceBase Hub Years 1 through 5

Data Management

Data Management

The FaceBase Consortium