1 / 58

Prof. Dr. Stefan Edlich NoSQL in der Cloud

Prof. Dr. Stefan Edlich NoSQL in der Cloud. n o sqlberlin.de n o sqlfrankfurt.de n o sql powerdays. http://n o sql-database.org. N o SQL is specialization!. Big Data Massive Write Performance Fast KV Access Write Availability Flexible Schema (Migration) + Flexible Datatypes

torgny
Télécharger la présentation

Prof. Dr. Stefan Edlich NoSQL in der Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prof. Dr. Stefan Edlich NoSQL in der Cloud

  2. nosqlberlin.de nosqlfrankfurt.de nosql powerdays

  3. http://nosql-database.org

  4. NoSQL is specialization! • Big Data • Massive Write Performance • Fast KV Access • Write Availability • Flexible Schema (Migration) + Flexible Datatypes • Easier maintainability, administration and operations • No single point of failure • Programmer ease of use

  5. Theorie?! Map/Reduce  Map/Reduce Nachfolger! ACID / BASE & CAP  P liegt in der Regel nie vor! Consistent Hashing  Basis skalierbarer K/V Stores MVCC  non blocking Vorteile Vector Clocks  [122:1] [147:2|122:1] [97:3|147:2|122:1]

  6. Google Protocol Buffers =>

  7. Apache Avro! • JSON • Binary data transfer • automaticRPC generation • no code generation • Client + Server tauschen Schema bei Änderung unbedingt evaluieren!

  8. Datenmodelle

  9. Column Family DocumentDBs Voldemort, Chordless, Scalaris, Dynamo / Dynomite Key/ValueDBs GraphDBs db4o, Versant, Objectivity, Gemstone, Progress, Mark Logic, EMC Momentum, Tamino, GigaSpaces, Hazelcast, Terracotta, … andere

  10. Cassandra HBase SimpleDB

  11. + Skalierung = new node + Replikation + Konfiguration (r, w) - Dokumentation - Abfragen + stressfreie SaaS Lösung + transparent scaling - UTF-8 String - Daten liegen bei Amazon +- kein tuning / config + Skalierung = new node + Community + API - Replikation - Aufsetzen, Optimierung, Wartung

  12. Document Databases

  13. any JS-Client no Middleware! DB+WebServer +evolving App

  14. 2.Runde += 6,5 Mio $

  15. nicht normalisiert (Duplicates, Delete Orphans, ...) • (konfigurierbare Zeit Crash anfällig) (Journaling) • Eventually Consistent • echte Skalierung nur über Sharding • - (noch nicht kill -9fest)

  16. 67 GB Index Data  EC2 Node 66 GB EC2 Node 66 GB 11 hours + 1 day off

  17. + nicht normalisiert + Schema Agilität + Doku exzellent + Speed (MemMapped Files) + Installation+save =28 sek! + beliebige Indizes + MapReduce + Rich Query Language + GridFS(statt HDFS) + einfache Replizierung (Master-Slave / Replica Sets)

  18. db.system.indexes.find(); db.friends.getIndexes(); db.friends.ensureIndex({friend: 1}); db.friends.ensureIndex({friend: 1, zip: 1}); //compound db.friends.find({friend: „Mario“, zip: „13755“}).explain(); Queries: age: {$gt: 10} food:{$all: [„pizza“, „noodles“]} $gt, $lt, $lte, $ne, $in, $nin, $mod, $all, $size, $exists, $type, , $or, $elem, $elemMatch, regexp, ... NoSQL Query LockIn?!

  19. Sich veränderndes Schema Migrations Architektur-Pattern: A) Blacklist try { ... } catch (FirstException | SecondException ex) { // newName = BlackList.checkName(OldName)} rename 

  20. B) „Rails“ Migration new name new name new name new name old name new name old name new name old name new name old name new name (nicht wenn zu oft repliziert)

  21. Duplikate = SpaceAktualität der Daten „Pre-Joined“ Daten! „pre-computeD“ • wachsende Daten • raus oder Pre-SPACED

  22. In die Cloud…

  23. Clients mongos ROUTER Config Servers Shard B Shard C Shard A RAM+DISK+ Replica Set POSSIBLE ARBITER micro 64 bit [extra | double | quadrupel] Large

  24. Erfahrungen… • RAID Konfigurationen (00,01,10,03,05, …) • Journaling-Dateisysteme (ext4, xfs, …) • (Security) Ports, F-Deskriptoren, Snapshots,… • www.mongodb.org/display/DOCS/Amazon+EC2

  25. K/V-Stores Datenstrukturen abbilden -> + sehr schnell > 100.000 /sek + konfigurierbarer Disc sync + API für eigene Anbindung + einfache Replikation + hash, list, set, sorted set, messages + Installation UNIX: 38 sek Windows: 18 sek - cloud-cluster erst in Version 3.*

  26. Sorted Set

  27. memcached API

  28. simply dynamic scaling (up & down) • scales linear • bullet proof by Zynga.com • limited membase protocol • Membase Tap (Protocol Interception) • Code-Node:

  29. Membase in der Cloud • Fertige RightScale & AMI templates • Diverse Ports öffnen • DNS Eintrag und keine verändernden IPs • Master Node angeben • legt Quota für die Erben fest • Backups für EBS

  30. GraphDBs Property Graph

  31. player

  32. Graph DBs in der Cloud • > N Milliarden Knoten? Sharding! • aber meistens kein „predictablelookup“  • möglich nur bei Domain SpecificKnowledge • ausbalancierte DBs ohne sweetspots kaum möglich • Access Patterns + Heuristiken (Insert Sharding / RuntimeSharding) => partitionierungs Algorithmen • (HA) Neo4j Cache Sharding! • Multi-Master Cluster forConsistent Routing

  33. > 220 DBs durchausfrustrierendes Consulting…

  34. Data Transactions Performance Queries Architecture • other Non-Functional Requirements

  35. Analyse your Data Domain-Data, Log-Data, Event-Data, Message-Data, critical Data, Business-Data, Meta-Data, temp Data, Session-Data, Geo Data, etc. Data- / Storage-Model: relational, column-o, doc-alike, graphs, objects, etc. What Types / Type-System? Data-Navigation, Data Amount, Data Komplexity (Deep XML?) ACID vs. BASE vs. Mixture? CAP decisions Performance Dimension Analysis Latency, Request behaviour, Throughput Scale-Up vs Scale-Out Query Requirements Typical queries, Tools, Ad-Hoc Queries, SQL / LINQ needed, Map/Reduce? … Distribution Architecture local, parallel, distributed / grid, service, cloud, mobile, p2p, … Data Access Patterns read / write distribution, random / sequential, Access Design Patterns Non Functional Requirements:Replication, Refactoring Frequency, DB-Support, Qualification / simplicity, Company restrictions, DB diversity (allowed?), Security, Safety / Backup & Restore, Crash Resistance, Licence…

  36. NoSQLFAZIT

  37. Unbedingt RAM & SDD annehmen! RethinkDB Gustavo Alonso Lot‘s of >1 PT RAM DBsin California! SAP-Strategie? Service, RAM, Cloud, Mobile

  38. DaaS Zeitalter Alleine für MongoDB weit über 100 „Database-as-a-Service“ Provider! Amazon: SimpleDB, Hadoop, etc.

  39. Viele clevere hybrid Lösungen! CouchBase, Hadoop+MySQL

  40. Availability Ad Hoc Query OLAP Database-aaS=> best Mix!

  41. (View, Domain, Stamm, Meta, Log, …)by Couch, MongoDB, Redis, Membase, … unkritischeDaten kritischeDaten Management Zahlungsdaten, persönliche Daten, …by classic RDBMS, Vertica, VoltDB, Database.com, GenieDB, … Hadoop* BI OLAP BI Analytics Dwight Merriman (10gen)

  42. Links • nosql-database.org • nosqltapes.com • mynosql.com .com

  43. Thanks for listening! http://edlich.de Diskussion!

  44. funktionale (graph) Dekomposition? Oder… Schutzpatent  Group By Use Case:Aggregate pi -> 1015 -> 1000 cluster

  45. Programmierung top! Programmierung nervt! herrlich paralellisierbar Nur `large data indexing` „A giant step back! Imcompatible, missing features, not new, …“ Stonebraker Starke Konkurrenz: Stratosphere (TUB), ePic, SwissBox, etc.

  46. Cross Paralellization Contracts Map Match CoGroup Reduce Graph Ops u.v.m… => compile, analyze, optimize auf einer atmenden Cloud!

  47. Eventually Consistent ACID WATER BASE • Amazon Dynamo • MySQL Replikation

More Related