1 / 109

Data Bases in Cloud Environments

Data Bases in Cloud Environments. Based on: Md. Ashfakul Islam Department of Computer Science The University of Alabama. Data Today. Data sizes are increasing exponentially everyday. Key difficulties in processing large scale data acquire required amount of on-demand resources

emilia
Télécharger la présentation

Data Bases in Cloud Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Bases in Cloud Environments Based on: Md. Ashfakul Islam Department of Computer Science The University of Alabama

  2. Data Today • Data sizes are increasing exponentially everyday. • Key difficulties in processing large scale data • acquire required amount of on-demand resources • auto scale up and down based on dynamic workloads • distribute and coordinate a large scale job on several servers • Replication – update consistency maintenance • Cloud platform can solve most of the above

  3. Large Scale Data Management • Large scale data management is attracting attention. • Many organizations produce data in PB level. • Managing such an amount of data requires huge resources. • Ubiquity of huge data sets inspires researchers to think in new ways. • Particularly challenging for transactional DBs.

  4. Issues to Consider • Distributed or Centralized application? • How can ACID guarantees be maintained? • Atomicity, Consistency, Isolation, Durability • Atomic – either all or nothing • Consistent - database must remain consistent after each execution of write operation • Isolation – no interference from others • Durability – changes made are permanent

  5. ACID challenges • Data is replicated over a wide area to increase availability and reliability • Consistency maintenance in replicated database is very costly in terms of performance • Consistency becomes bottleneck of data management deployment in cloud • Costly to maintain

  6. CAP • CAPtheorem • Consistency, Availability, Partition • Three desirable, and expected properties of real-world services • Brewer states that it is impossible to guarantee all three

  7. CAP: Consistency - atomic • Data should maintain atomic consistency • There must exist a total order on all operations such that each operation looks as if it were completed at a single instant • This is not the same as the Atomic requirement in ACID

  8. CAP: Available Data Objects • Every request received by a non-failing node in the system must result in a response • No time requirement • Difficult because even in severe network failures, every request must terminate • Brewer originally only required almost all requests get a response, this has been simplified to all

  9. CAP: Partition Tolerance • When the network is partitioned all messages sent from nodes in one partition to nodes in another partition are lost • This causes the difficulty because • Every response must be atomic even though arbitrary messages might not be delivered • Every node must respond even though arbitrary messages may be lost • No failure other then total network failure is allowed to cause incorrect responses

  10. CAP: Consistent & Partition Tolerant • Ignore all requests • Alternate solution: each data object is hosted on a single node and all actions involving that object are forwarded to the node hosting the object

  11. CAP: Consistent & Available • If no partitions occur it is clearly possible to provided atomic (consistent), available data • Systems that run on intranets and LANs are an example of these algorithms

  12. CAP: Available & Partition Tolerant • The service can return the initial value for all requests • The system can provide weakened consistency, this is similar to web caches

  13. CAP: Weaker Consistency Conditions • By allowing stale data to be returned when messages are lost it is possible to maintain a weaker consistency • Delayed-t consistency- there is an atomic order for operations only if there was an interval between the operations in which all messages were delivered

  14. CAP • Can only achieve 2 out of 3 of these • In most databases on the cloud, data availability and reliability (even if network partition) are achieved by compromising consistency • Traditional consistency techniques become obsolete

  15. Evaluation Criteria for Data Management • Evaluation criteria: • Elasticity • scalable, distribute new resources, offload unused resources, parallelizable, low coupling • Security • untrusted host, moving off premises, new rules/regulations • Replication • available, durable, fault tolerant, replication across globe

  16. Evaluation of Analytical DB • Analytical DB handles historical data with little or no updates - no ACID properties. • Elasticity • Since no ACID – easier • E.g. no updates, so locking not needed • A number of commercial products support elasticity. • Security • requirement of sensitive and detailed data • third party vendor store data • Replication • Recent snapshot of DB serves purpose. • Strong consistency isn’t required.

  17. Analytical DBs - Data Warehousing • Data Warehousing DW - Popular application of Hadoop • Typically DW is relational (OLAP) • but also semi-structured, unstructured data • Can also be parallel DBs (teradata) • column oriented • Expensive, $10K per TB of data • Hadoop for DW • Facebook abandoned Oracle for Hadoop (Hive) • Also Pig – for semi-structured

  18. Evaluation of Transactional DM • Elasticity • data partitioned over sites • locking and commit protocol become complex and time consuming • huge distributed data processing overhead • Security • same as for analytical

  19. Evaluation of Transactional DM • Replication • data replicated in cloud • CAP theorem: Consistency, Availability, data Partition, only two can be achievable • consistency and availability – must choose one • availability is main goal of cloud • consistency is sacrificed • database ACID violation – what to do?

  20. Transactional Data Management

  21. Transactional Data Management Needed because: • Transactional Data Management • heart of database industry • almost all financial transaction conducted through it • rely on ACID guarantees • ACID properties are main challenge in transactional DM deployment in Cloud.

  22. Transactional DM • Transaction is sequence of read & write operations. • Guarantee ACID properties of transactions: • Atomicity - either all operations execute or none. • Consistency - DB remains consistent after each transaction execution. • Isolation - impact of a transaction can’t be altered by another one. • Durability - guarantee impact of committed transaction.

  23. Existing Transactions for Web Applications in the Cloud • Two important properties of Web applications • all transactions are short-lived • data request can be responded to with a small set of well-identified data items • Scalable database services like Amazon SimpleDB and Google BigTable allow data to be queried only by primary key. • Eventual data consistency is maintained in these database services.

  24. Related Research • Different types of consistency • Strong consistency – subsequent accesses by transactionswill return updated value • Weak consistency – no guarantee subsequent accesses return updated value • Inconsistency window – period between update and when guaranteed will see update • Eventual consistency – form of weak • If no new updates, eventually all accesses return last updated value • Size of inconsistency window determined by communication delays, system load, number of replicas • Implemented by domain name system (DNS)

  25. Commercial Cloud Databases • Amazon Dynamo • 100% available • Read sensitive • Amazon Relational Database Services • MySQL builtin - replica management used • All replicas in the same location • Microsoft Azure SQL • Primary with two redundancy servers • Quorum approach • Xeround MySQL [2012] • Selected coordinator processes read & write requests • Quorum approach • Google introduced Spanner • Extremely scalable, distributed, multiversion DB • Internal use only

  26. Tree Based Consistency (TBC) • Our proposed approach: • Minimize interdependency • Maximize throughput • All updates are propagated through a tree • Different performance factors are considered • Number of children is also limited • Tree is dynamic • New type of consistency ‘apparent consistency’ is introduced

  27. System Description of TBC • Controller • Tree creation • Failure recovery • Keeping logs • Replica server • Database operation • Communication with other servers • Two components • Controller • Replica server 27

  28. Performance Factors • Identified performance factors • Time required for disk update • Workload of the server • Reliability of the server • Time to relay a message • Reliability of network • Network bandwidth • Network load

  29. PEM • Causes to enormous performance degradation • Disk update time, workload or reliability of server • Reliability, bandwidth, traffic load of network • Performance Evaluation Metric • pfi= ithperformance factor • wfi= ithweight factor • Wficloud be positive or negative • Bigger PEM means better

  30. Building Consistency Tree • Prepare the connection graph G(V,E) • Calculate PEM for all nodes • Select the root of the tree • Run Dijkstra’s algorithm with some modification • Predefined fan-out of tree is maintained by algorithm • Consistency tree is returned by algorithm

  31. Example Connection Path Server Server Reliability (pf1) Server Delay (pf2) Path Reliability (pf1) Path Delay (pf2) wf1 = 1 ; wf2 = -.02 5 1 6 2 4 2 5 3 3 1 6 4

  32. Update Operation • An update operation is done in four steps • An update request will be sent to all children of the root • The root will continue to process the update request on its replica • The root will wait to receive confirmation of successful updates from all of its immediate children • A notification for a successful update will be sent from root to the client

  33. Consistency Flag • Two types of consistency flag used: • Partial consistent flag, Fully consistent flag • Partial consistent flag • Set by top-down approach • Last updated operation sequence number is stored as flag • Inform immediate children • Set Fully Consistent Flag • Set by bottom-up approach • leaf found empty descendants list, set fully consistent flag as operation sequence number • informs immediate ancestor • ancestor set fully consistent flag after getting confirmation from all descendants

  34. Consistency Assurance • All update requests from user are sent to root • Root waits for its immediate descendants during update requests • Read requests are handled by immediate descendants of root

  35. Maximum Number of Allowable Children • Larger number of children • higher interdependency • possible performance degradation • Smaller number of children • Less reliability • Higher chance of data loss • Three categories of trees in experiment • sparse, medium and dense • t = op + wl • Where t-resp time, op-disk time, wl-OS load

  36. Maximum Number of Allowable Children Maximum number should be set by trading off between reliability and performance

  37. Inconsistency Window • Amount of time a distributed system is being inconsistent • Reason behind • Time consuming update operation • Accelerate update operations • System starts processing next operation in queue • Getting confirmation from certain number of node • Not waiting for all to reply

  38. MTBC • Modified TBC • Root sends update request to all replica • Root waits for only its children to reply • Intermediate nodes will make sure either their children are updated or not • MTBC • Reduces inconsistency window • Increase complexity at children end

  39. Effect of Inconsistency Window • Inconsistency window has no effect on performance • Possible data loss only if root and its children all go down at the same time

  40. Failure Recovery • Primary server failure • controller finds most updated servers with help of consistency flag • finds max reliable server from them • rebuild consistency tree • initiate synchronization

  41. Failure Recovery • Primary server failure • Controller identifies max reliable server from them • Rebuilds consistency tree • Initiates synchronization • Other server or communication path down • Checks server down or communication down • Rebuilds tree without down server • Finds alternate path • Reconfigures tree

  42. Apparent Consistency • All write requests are handled by root • All read requests are handled by root’s children • Root and its children are always consistent • Other nodes don’t interact with user • User found system is consistent any time • We call it “Apparent Consistency” – ensures strong consistency

  43. Different Strategies • Three possible strategies for consistency • TBC • Classic • Quorum • Classic • Each update requires participation of all nodes • Better for databases that are updated rarely • Quorum • Based on quorum majority voting • Better for databases that are updated frequently • TBC

  44. Differences in read and write operations

  45. Classic Read and Write Technique

  46. Quorum Read and Write Technique

  47. TBC Read and Write Technique

  48. Experiments • Compare the 3 strategies

  49. Experiments Design • All requests pass through a gateway to the system • Classic, Quorum & TBC implemented on each server • A stream of read & write requests are sent to the system • Transport layer is implemented in the system • Thousands of ping responses are observed to determine transmission delay and packet loss pattern • Acknowledgement based packet transmission

  50. Experimental Environment • Experiments are performed on a cluster called Sage • Green prototype cluster at UA • Intel D201GLY2 mainboard • 1.2 GHz Celeron CPU • 1 Gb 533 MhzRAM • 80 Gb SATA 3 hard drive • D201GLY2 builtin10/100 Mbps LAN • Ubuntu Linux 11.0 server edition OS • Java 1.6 platform • MySQL Server (version 5.1.54)

More Related