Search with a Key-Value Store

Search with a Key-Value Store

Intro to NoSQL • Key-value store • Schemaless • Distributed • Eventually Consistent

Key-Value • Single unique key for each value in the database • Extremely fast look-up • Easy distribution (no such thing as joins)

Schemaless • Critical for extremely large data sets • No alter table commands, each value has no pre-defined fields

Distributed • Data set is designed to be shared across multiple machines • Typically makes use of commodity servers with enough RAM to keep the entire data set in memory

Eventually Consistent • Replica nodes are not notified of changes before a success response is returned to the client • Makes NoSQL problematic for highly sensitive transactions (finance, etc)

Database Design in NoSQL • Denormalization is your friend • Think of collections as views on a data set that

A News Site Using SQL

Loading a Story with SQL SELECT * FROM comments LEFT JOIN users ON users.id = comments.user_id LEFT JOIN comments children ON children.parent_id = comments.id WHERE story_id = x SELECT * FROM stories

Redesigned in a NoSQL Data Store Story #dgi3ck date headline content comments Comment #la529 content username user_image_url user_id children Comment #mn34i content username user_image_url user_id Comment #5bg26 content username user_image_url user_id children

Loading a Story with NoSQL Stories::get(dgi3ck)

Some Design Considerations • What is the context in which we will access this data? • What data do we need to access outside the of this context? • How often does the data change?

Embedded Data • NoSQL can support foreign keys • Some data is more appropriately stored “embedded” in a parent context • E.g. Comments are rarely (if ever) accessed outside of their parent Story

Cached Data • Data from an object that needs to be accessed outside of the current context can be cached • Keep in mind that it may need to be updated • E.g. a user changes his username, Comments can be updated

Several common NoSQL Stores • Memcached • BigTable • SimpleDB • MongoDB

Why we chose MongoDB • Auto-sharding and easy setup for distribution • JavaScript API • Powerful indexing capabilities

MongoDB Libraries • ORM: mongo_mapper • https://github.com/jnunemaker/mongomapper • Underlying Connection: mongo • https://github.com/mongodb/mongo-ruby-driver • BSON support: bson_ext • http://rubygems.org/gems/bson_ext

Lifebooker’s Availability Search • Searches across Services • Filters • Time/Date • Geographical Zone • Service Category • Practitioner Gender • Concurrent Availability • (and several more)

Services, Discounts and Practitioners • Services are offered by Providers • Providers have Practitioners (Employees) • Discounts are applied to Providers for a Service in a given time

Modeling this Data in MongoDB

Embedding with MongoMapper

Indexing and Searching • Mongo offers powerful indexing capabilities • Arrays are “first-class citizens” • Complex indices allow for great performance

Creating Meta-Data • With complex data structures, creating meta-data before_save will allow you to make that data easily searchable • E.g. the maximum discount on a given day for a service

Creating Indices

Querying • Uses DataMapper/Arel Syntax • Chains conditions, ordering and offset

Filtering Complex Data Structures • MongoDB offers a JavaScript API for MapReduce • Map - transform and filter data • Reduce - combine multiple rows into a single record

A Simple Use-Case

Using MapReduce to Filter Filter

The Results • Scheduled to go live within 2 weeks • With sharding/distribution, tests show almost no dip in response time with more than 10x the current data set • 20x faster than MySQL implementation • 100ms vs 2000ms (or more)

Search with a Key-Value Store