360 likes | 535 Vues
Richmond MUG – May 2014. MongoDB 2.6. Jason Ford – Principal Engineer, Snagajob. MongoDB World!. First National MongoDB Conference June 23 – 25 in New York City Use discount code mug_25 to get 25% off registration. Meetup Calendar. Today: MongoDB 2.6. July 8: MongoDB World Post-Mortem.
E N D
Richmond MUG – May 2014 MongoDB 2.6 Jason Ford – Principal Engineer, Snagajob
MongoDB World! First National MongoDB Conference June 23 – 25 in New York City Use discount code mug_25 to get 25% off registration
Meetup Calendar • Today: MongoDB 2.6 • July 8: MongoDB World Post-Mortem • September 9: TBD • November 4: TBD (Second Anniversary)
Richmond MUG – May 2014 MongoDB 2.6 Jason Ford – Principal Engineer, Snagajob
Overview • In development for a full year • (longer than any prior release) • First major rewrite of the codebase • Including full rewrite of the query engine • Some significant new features, but primary goal of release is foundation for future development
Read Operations • Largely transparent • New framework highly extendable • .maxTimeMS() operator • Allows for timeouts on a per-operation basis • Great for adhoc queries • Available in all drivers • Indexes
Indexes • Background Index builds to secondary nodes • Index builds can resume if interrupted • dropDups option deprecated • Index Intersection • Great for ad-hoc queries • Still want dedicated compound indexes for oft-used queries
Indexes • Consider a collection with these indexes: • { qty : 1 } • { item : 1} • Index Intersection may be used to support the following query: • db.orders.find({ item: “abc123”, qty: { $gt : 15}}) • Emphasis on MAY • Single index queries may be more efficient
Read Operations • Text Search • Beta feature in 2.4, now enabled by default • Probably only practical for small collections • Indexes are very large • Query execution framework completely rewritten • Queryparser, optimizer, cache, etc • Findqueries are noticeablyfaster
Cached Query Plan Interface • New insight/control provided into mongoDB’s query execution • mongoDBquery optimizer has long tried to figure out the most efficient use of indexes on a per-query basis, and cache them • db.collection.getPlanCache() provides an interface to view and clear stored query strategies by query shape
Cached Query Plan Interface • db.jobseeker.getPlanCache().help()
Aggregation Framework • Introduced in 2.2 • Finally seems fully baked in 2.6 • Queries return a cursor • Used to return a single document (16MB limit) • Results can be output to a new collection • $out operator
Aggregation Framework db.jobseeker.aggregate( { $project : { _id: 0, alert : '$p.n'} }, { $unwind : "$alert" }, { $group : { _id : "$alert", count: {$sum : 1} } }, { $out : "alertsummary" } )
Write Operations • Insert, Update, Delete completely rewritten to use commands • Write operations always returns a WriteResultobject • Even a {w:0} specification sends back a yes/no response • Forget about “fire and forget”
Write Operations Sample Update Command (db.runCommand): { update: 'collection name' , updates: [{ q: { a : 1 } , u: { $inc : { x : 1}} , multi: true/false , upsert: true/false }, ...] , writeConcern: { w: 1, j: true, wtimeout: 1000 } , ordered: true/false }
WriteResult Structure { "ok" : 1, "n" : 0, "nModified": 1, (Applies only to update) "nRemoved": 1, (Applies only to removes) "writeErrors" : [ { "index" : 0, "code" : 11000, "errmsg" : "insertDocument :: caused by :: 11000 E11000 duplicate key error index: t1.t.$a_1 dup key: { : 1.0 }" } ], writeConcernError: { code : 22, errInfo: { wtimeout : true }, errmsg: "Could not replicate operation within requested timeout" } }
Write Operations • WriteConcern can be specified on a per-operation basis db.products.insert( { item: "envelopes", qty : 100, type: "Clasp" }, { writeConcern: { w: "majority", wtimeout: 5000 } }) • Field Order • _id field will ALWAYS be first • Field order will be preserved (unless a field is renamed)
Bulk Write Operations • All write operations can now happen in bulk • Super cool fluid language • Significant performance increase
Bulk Write Operations OLD WAY // get cursor varcursor = db.myCollection.find({}, {_id:1}); // returns 100,000 documents // iterate through and update each document while(cursor.hasNext()) { var doc = cursor.next(); db.myCollection.update({_id : doc._id}, { $set : { up : x }}); } TIME: 67.4 Seconds
Bulk Write Operations NEW WAY // create bulk object varbulk = db.myCollection.initializeUnorderedBulkOp(); // add update operations to BulkOp for (var x = 0; x < 100000; x++) { bulk.find({_id : x }).update({ $set : { up : x }}); } // send update operations to the database bulk.execute(); TIME: 5.5 Seconds (62 seconds faster)
Storage • Power of 2 Allocation (introduced in 2.2) now set as the default allocation strategy • Each record has a size in bytes that is a power of 2 (e.g. 32, 64, 128, 256, 512...16777216.) • Smallest allocation size is 32 bytes
Storage • Two advantages/goals: 1. The limited number of record allocation sizes makes it easier for mongo to reuse existing allocations, reducing fragmentation 2. The space allocated for each document is usually larger than the data they hold. This allows documents to grow while minimizing the chance that mongo will need to allocate space as data is added to a document.
Storage • Power of 2 sizes replaces previous “Exact Fit” allocation strategy - allocated the exact size needed plus a small (configurable) padding factor • Was inefficient for heavy write operations and inefficient for reallocating space
Sharding & Replication • Ability to merge Chunks • Chunks must be contiguous • Chunks must be on same shard • One chunk must be empty
Sharding & Replication Ability to remove orphaned documents orphaned Documents: documents on a shard that also exist in chunks on other shards as a result of failed migrations or incomplete migration cleanup due to abnormal shutdown Delete orphaned documents using cleanupOrphaned to reclaim disk space and reduce confusion.
Sharding & Replication Ability to remove orphaned documents • - Must be run on admin db of the primary member of a replica set (NOT mongos) db.runCommand( { "cleanupOrphaned": "test.info", "startingAtKey": { x: 10 }, "secondaryThrottle": true } )
Security • Integration (Enterprise Edition Only) • Kerberos introduced in 2.4 • 2.6 adds LDAP and x.509 protocols • There’s also a Windows Enterprise Edition now • Linux Enterprise introduced in 2.4
Security • User-Defined Roles & Collection Level Access • Before: readonly and full admin were the only options (per database) • 2.6 adds Role-Based Access Control • Separate upgrade • Users are granted Roles • Roles have Privileges • Privileges are an action and a resource • ex: Update (action) on product db (resource)
Security Built in Database Roles: • read (read only access) • readWrite (CRUD, create, rename, and drop collections, create and drop indexes) • dbAdmin (read access to system.profile collection – weirdly specific, but ok) • userAdmin (create and modify roles and users) • dbOwner (readWrite + dbAdmin + userAdmin)
Security Built in Cluster Roles (create on admin DB): • clusterManager (add/remove shards, change replset and cluster config, manage chunks, etc) • clusterMonitor (read access to cluster admin info) • hostManager (misc admin commangs (killop/shutdown/repairDatabase) • clusterAdmin (all of the above + dropDatabase)
Security Other Roles (adminDB): • backup, restore (mongodump/mongorestore) • readAnyDatabase, readWriteAnyDatabase, userAdminAnyDatabase, dbAdminAnyDatabase • root (readWriteAnyDatabase, dbAdminAnyDatabase, userAdminAnyDatabase, clusterAdmin)
Security Custom Roles: db.runCommand({ createRole: "myClusterwideAdmin", privileges: [ { resource: { cluster: true }, actions: [ "addShard" ] }, { resource: { db: "config", collection: "" }, actions: [ "find", "update", "insert", "remove" ] }, { resource: { db: "users", collection: "usersCollection" }, actions: [ "update", "insert", "remove" ] }, { resource: { db: "", collection: "" }, actions: [ "find" ] } ], roles: [ { role: "read", db: "admin" } ]}) LOTS of new stuff here – check out documentation
Security User Creation Example: use products db.createUser( { "user" : "accountAdmin01", "pwd": "cleartext password", "customData" : { employeeId: 12345 }, "roles" : [ { role: "myClusterwideAdmin", db: "admin" }, { role: "readAnyDatabase", db: "admin" }, "readWrite" ] }) This user has readWrite permissions on products DB, read permissions on all DBs, and has the permissions of the role we created earlier.
Miscellaneous $min & $max conditional updates - Ex: db.scores.update( { _id: 1 }, { $min: { lowScore: 150 } } ) Enhancements to 2D sphere indexes mongoexport supports --skip, --limit, --sort rs.printReplicationInfo() rs.printSlaveReplicationInfo() – human readable helper methods
The Future “You’ll see the benefits in better performance and new innovations. We re-wrote the entire query execution engine to improve scalability, and took our first step in building a sophisticated query planner by introducing index intersection. We’ve made the codebase easier to maintain, and made it easier to implement new features. Finally, MongoDB 2.6 lays the foundation for massive improvements to concurrency in MongoDB 2.8, including document-level locking.” - Eliot Horowitz, CTO and Co-Founder, MongoDB
Richmond MUG – May 2014 MongoDB 2.6 Jason Ford – Principal Engineer, Snagajob