150 likes | 247 Vues
in 10 minutes. Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr. What is MongoDB?. Document database Data is structured as schema-less JSON documents One of the most popular NoSQL solutions Cross-platform and open source written in C++ supports Windows, Linux, Mac OS X, Solaris.
E N D
in 10 minutes Mohannad El Dafrawy Sara Rodriguez Lino Valdivia Jr
What is MongoDB? • Document database • Data is structured as schema-less JSON documents • One of the most popular NoSQL solutions • Cross-platform and open source • written in C++ • supports Windows, Linux, Mac OS X, Solaris
Features (I) • Document-based storage and querying • Queries themselves are JSON documents • Full Index Support • Allows indexing on any attribute, just like in a traditional SQL solution • Replication & High Availability • Supports mirroring of data for scalability
Features (II) • Auto-Sharding (horizontal scaling) • Large data sets can be divided and distributed over multiple shards • Fast In-Place Updates • Update operations are atomic for contention-free performance • Integrated Map/Reduce framework • Can perform map/reduce operations on top of the data
History • First developed by 10gen (later MongoDB, Inc.) in 2007 • Name comes from “humongous” • Became open source in 2009 • Latest stable release (2.4.9) released Jan 2014
Basic Ideas { _id: 1234, author: { name: “Bob Jones”, email: “b@b.com” }, post: “In these troubled times I like to ...“, date: { $date: “2014-03-12 13:23UTC” }, location: [ -121.2322, 48.1223222 ], rating: 2.2, comments: [ { user: “lalal@hotmail.com”, upVotes: 22, downVotes: 14, text: “Great point! I agree” }, { user: “pedro@gmail.com”, upVotes: 421, downVotes: 22, text: “You are a...” } ], tags: [ “databases”, “mongo” ] } • Collections of JSON objects • Embed objects within a single document • Flexible schema • References
Query Example db.posts.find({ author.name: “mike” }) db.posts.find({ rating: { $gt: 2 }}) db.posts.find({ tags: “software” }) db.posts.find().sort({date: -1}).limit(10) // select * from posts where ‘economy’ in tags order by ts DESC db.posts find( {tags :‘economy’}) .sort({ts :-1 }).limit(10); http://try.mongodb.org/
Note on internals • documents stored as BSON (Binary JSON) • memory-mapped files • indexes are B-Trees http://bsonspec.org {_id: ObjectId(XXXXXXXXX), hello: “world”} \x27\x00\x00\x07 _i d\x00 X X X X X X X X\x02 h e l l o\x00\x06\x00 \x00\x00 w o r l d\x00\x00
Cassandra (1.2) Best used: • When you write more than you read (logging). • If every component of the system must be in Java. • If you require Availability + Partition Tolerance For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is data analysis. MongoDB (2.2) Best used: • If you need dynamic queries. • If you prefer to define indexes, not map/reduce functions. • If you need good performance on a big DB. • If you require Consistency + Partition Tolerance Forexample: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back. VS source: http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
Why (and why not) MongoDB? • If you need dynamic queries • If you prefer to define indexes, not map/reduce functions • If you need good performance on a big DB • If you wanted CouchDB, but your data changes too much, filling up disks • It lacks transactions, so if you're a bank, don’t use it • It doesn't support SQL • It doesn't have any built-in revisioning like CouchDB • It doesn't have real full text searching features
Production Users •Archiving - Craigslist •Content Management - MTV Networks •E-Commerce - Customink •Real-time Analytics - intuit •Social Networking - Foursquare
Long-term goals for MongoDB To add new features as: • Natural language processing • Full text search engine • More real-time search in data
Personal conclusion • Getting up to speed with MongoDB (document oriented and schema free) • Advanced usage (tons of features) • Administration (Easy to admin,replication,sharding) • Advanced usage (Index & aggregation) • BSON and Memory-Mapped • There are times where not all clients can read or write. CP (Consistency and Partition Tolerance).
References • MongoDB.org (https://www.mongodb.org/) • Wikipedia: MongoDB (http://en.wikipedia.org/wiki/MongoDB) • DB-Engines Ranking (http://db-engines.com/en/ranking) • Interview about the future of MongoDB (http://strata.oreilly.com/2012/11/the-future-of-mongodb.html) • MongoDB Inside and Outside by Kyle Banker (http://vimeo.com/13211523) • How This Web Site Uses MongoDB (http://www.businessinsider.com/how-we-use-mongodb-2009-11) • Cassandra and MongoDB comparison (http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis)