1 / 45

An Analysis of the Suitability of SQL and NoSQL for Commercial Applications

An Analysis of the Suitability of SQL and NoSQL for Commercial Applications. Project Group 12: Monika Puhazhendhi – A0075132N Wang Jun – A0078092Y Qiao Liang – U0905199 Nithiya Priya Ravichandran – A0069854M . A CS4221 project. Introduction.

trapper
Télécharger la présentation

An Analysis of the Suitability of SQL and NoSQL for Commercial Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Analysis of the Suitability of SQL and NoSQL for Commercial Applications Project Group 12: Monika Puhazhendhi– A0075132N Wang Jun – A0078092Y Qiao Liang – U0905199 Nithiya Priya Ravichandran – A0069854M A CS4221 project

  2. Introduction • Database volumes expanding with cloud computing • Multiple sources of data - customers, the public, point-of-sale devices, GPS, mobile devices, RFID readers, website, video and so on • Databases and Datastore – increasing importance

  3. MySQL & NoSQL - Overview • MySQL • Tested Relation DBMS • SQL standard • ACID properties • Good performance with biz and web applications • NoSQL • “Not Only SQL” • Schema-less DBMS • BASE properties • Representation forms: Document-store, Key-value, XML, Graph, Column-store • Designed for flexibility and scalability

  4. Project Objective ObjectiveAnalyze suitability of SQL and NoSQL for commercial applications through case studies Overview • Case studies • Pure SQL/NoSQL • Hybrid • Security Issues • Analysis of SQL vsNoSQL

  5. Pure SQL / NoSQL

  6. Wikipedia - MySQL • Own DBMS - Media wiki • Primary DB model – MySQL • Provides multiple user access • Handles heavy weight queries • Efficient recovery from crash • Satisfactory performance

  7. Why not NoSQL? • Intensive cross-indexing and multiple different views • Lose efficiency and data locality • Require moving of data from data to application layer • Sub-sections of wiki page • Individual version control along with normalization • Wastage of space due to duplication • Does not provide strong consistency and atomicity to handle high profile of data flow

  8. LinkedIn - NoSQL • Back-end : MySQL for ACID properties • Front-end : MongoDB & Voldemort • Voldemort • Key-value storage • Efficient handling of million reads and writes daily • Useful to access and analyze large unstructured data • DataBus : real-time DB change capture system

  9. Quora - MySQL • Social Q&A site with approx. 5m users • Pure MySQL DB solution • No intention to change to NoSQL anytime soon • Built on a range of database technologies

  10. Quora - MySQL Why? • They don’t have to • Smaller user base than Facebook and Twitter • Read-heavy DB operations – Mitigate the load with read slaves • Able to scale up and scale out conventionally without issue • They don’t want to • Believe SQL is the “right way” to go – support for structured data, heavy lifting, no post-processing, efficient data storage with little redundancy • NoSQL not matured sufficiently

  11. The Hybrid

  12. Twitter • “Online social networking and microblogging service that enables users to send and read ‘tweets’, which are text messages limited to 140 characters.” – wiki • 500m+ active users • Average tweets per day: 58m

  13. Twitter - Early stage (2007) • MySQL with replica and memcache

  14. Twitter - Current stage • Cassandra • High-velocity writes and low-velocity reads • Usage in geolocation, places of interest, real-time data analytics • Gizzard • Distributed data storage framework • Manages data partitioning across data stores

  15. Gizzard

  16. FlockDB

  17. But it’s still MySQL • FlockDB built on Gizzard and MySQL for ID generation and graph storage

  18. Twitter – Future Plan “For now, we’re not working on using Cassandra as a store for Tweets. This is a change in strategy. Instead we’re going to continue to maintain our existing MySQL-based storage. We believe that this isn’t the time to make large scale migration to a new technology.”

  19. Facebook - Infrastructure 1. Front-end cluster: Web server & Memcache 2. Back-end cluster: Database (MySQL) 3. Auxiliary cluster: Inbox search, Ads,…

  20. Facebook – Implementation • Tier 1 – Web Server • Tier 2 – Memcache • Tier 3 – Database

  21. Facebook – Database Tier

  22. Facebook - MySQL • Key-value store • Why use MySQL? • Simplicity • Reliability • Speed • Random data distribution among many instances • Load balancing at physical node level • Custom partitioning scheme with global ID • Custom archiving scheme on per-user basis

  23. Facebook – Memcache Tier

  24. Facebook - Memcache • Distributed memory caching system • Caching of query results • Primary and secondary indexing • Speed up dynamic database-driven websites • Advantages • Highly reliable • Low latency • High service rates • Disadvantages • Easily corrupted • Limited data model • Inefficient for small data • Aware of only own server

  25. Facebook – Issues with MySQL • Local migration of data • Explosion of user traffic on single node • No support for global queries • Data-driven schemas • FB Objects and Associations • Higher level design abstraction • Automatic construction of SQL query and cache key • Performance issues – no well-defined schema

  26. Facebook - Cassandra • Use in inbox search • Issue – Balance between normalization and join queries • Features • Denormalization to one huge table • Replication among multiple nodes • Delayed consistency

  27. Facebook - HBase • Need for distributed fault-tolerant database and manage write-dominated workload • HBase – distributed DB • Based on Google’s BigTable • Integrated with Apache Hadoop and Zookeeper • Good scalability and performance • Simpler consistency model than Cassandra • HDFS

  28. Google App Engine • Database products • Google CloudSQL • Google Datastore

  29. Google App Engine - CloudSQL • Need for RDBMS • Until 2011, schema-less object data store based on GFS and BigTable • SQL-like syntax – “GQL” • No support for relational features – joins, sub-queries, multiple filters • CloudSQL • RDBMS service for development in Java and Python on GAE • Connect to MySQL instance in cloud • Import/export of instances • RESTful JSON API • Synchronous replication

  30. Google App Engine - Datastore • Infinite scalability • Eventual consistency vs. Strong consistency • Data modelling

  31. Google App Engine – CloudSQL vs. Datastore

  32. Security Issues

  33. Why security Without security, bad things can happen: • Confidentiality: Attacker can access your business secret • Integrity: We store a FYP paper but retrieve a piece of joke • Availability: Cannot access the data when we need it • Authentication: A hacker can login to your database and do whatever he wants

  34. Security Mechanisms of RDBMS • Strong authentication mechanism • Encryption mechanism for data storage and transmission • Consistent transaction • Mature industry standard

  35. Security Mechanisms of NoSQL

  36. NoSQL Security It does not mean no security protection but • Weak authentication and no authentication by default • Plain text storage and transmission • Motivation for vendors to offer security protection

  37. NoSQL Security – User Role • System Design • Security Model • Secured on Application side

  38. NoSQL Security – User Role • NoSQL is not SQL or RDBMS • Not designed for • Access or constrain control • Normal structured data • Replace RDBMS

  39. Conclusion

  40. Anlaysis of SQL vsNoSQL • No one perfect solution – Trade-offs • Decision is based on the particular use case. • Big players usually combine both to solve their various problems • NoSQL may take some time to mature as the community grows and commercial support catches up.

  41. Anlaysis of SQL vsNoSQL

  42. Anlaysis of SQL vsNoSQL

  43. Where to use SQL? • If the application requires • Good range of expressive and powerful queries • Assurance of ACID properties • Better security mechanisms • Maturity of technology • If scalability is more important than ensuring ACID properties, NoSQL is should be chosen

  44. Where to use NoSQL? • If the application requires • Easy, fast and high scalability • Flexible schema

  45. Thank You

More Related