1 / 24

Database Essentials

Database Essentials. Foundation about Databases. Agenda. Setting up Oracle environment Different categories of Databases RDBMS Concepts Creating Tables Loading data into tables Query data from tables Challenges with RDBMS Hadoop and Spark NoSQL Databases – MongoDB.

cletusf
Télécharger la présentation

Database Essentials

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Database Essentials Foundation about Databases

  2. Agenda • Setting up Oracle environment • Different categories of Databases • RDBMS Concepts • Creating Tables • Loading data into tables • Query data from tables • Challenges with RDBMS • Hadoop and Spark • NoSQL Databases – MongoDB

  3. Application Architecture – n-tier

  4. Current Scenario – Architecture(Typical Enterprise Application) Client (Browser) App Server Client (Browser) Database App Server Client (Browser)

  5. Setting up Oracle environment • Go to https://apex.oracle.com and sign up for an account • Load data by running script from my github account - https://github.com/dgadiraju/code/tree/master/rdbms/oracle

  6. Why Oracle?

  7. What other databases you know?

  8. What is RDBMS?

  9. What are other categories of Databases?

  10. Categories of Databases • Relational Databases (Oracle, Informix, Sybase, MySQL etc) • OLAP/Data Warehouse appliances/MPP (Teradata, Greenplum, Vertica) • NoSQL Databases (Cassandra, HBase, MongoDB etc) • In memory Databases (Gemfire, Coherence etc) • Search based Databases (Elastic Search, Solretc) • Batch processing frameworks (Map Reduce, Spark etc) * Modern applications need to be polyglot (different modules need different category of databases)

  11. Categories of Databases • Relational Databases – Mission Critical transactional based applications • OLAP/Data Warehouse appliances/MPP (Enterprise level reports) • NoSQL Databases – Non transactional based applications which require scalability • In memory Databases – In memory caches for performance reasons • Search based Databases – free flowing search based applications • Batch processing frameworks – Process huge volumes of data * Modern applications need to be polyglot (different modules need different category of databases)

  12. Categories of Databases – LinkedIn • Relational Databases – Payment gateways • OLAP/Data Warehouse appliances/MPP – Reports for executives • NoSQL Databases – Endorsement engine • In memory Databases – Complements almost every thing • Search based Databases – Search with in LinkedIn • Batch processing frameworks – Offline processing for job recommendations, friend recommendations, mass campaigns based on certain criteria and more * Modern applications need to be polyglot (different modules need different category of databases)

  13. RDBMS Concepts • Codd’s rules • Normalization • 1st normal form • 2nd normal form • 3rd normal form • BCNF • Database Objects • Tables • Indexes • And more

  14. Codd’s rules • Here is the Wiki page - https://en.wikipedia.org/wiki/Codd%27s_12_rules • Rule 0: The foundation rule: relational capabilities • Rule 1: The information rule: information -> values in tables • Rule 2: The guaranteed access rule: Each and every row should be accessible by using names – such as tables, columns etc • Rule 3: Systematic treatment of null values: • Rule 4: Dynamic onlinecatalog based on the relational model:

  15. Codd’s rules • Here is the Wiki page - https://en.wikipedia.org/wiki/Codd%27s_12_rules • Rule 5: The comprehensive data sublanguage rule: • A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings and that is comprehensive in supporting all of the following items:Data definition. • View definition. • Data manipulation (interactive and by program). • Integrity constraints. • Authorization. • Transaction boundaries (begin, commit and rollback).

  16. Codd’s rules • Here is the Wiki page - https://en.wikipedia.org/wiki/Codd%27s_12_rules • Rule 6: The view updating rule: • Rule 7: High-level insert, update, and delete: • Rule 8:Physical data independence: No impact on changing physical layer of data • Rule 9:Logical data independence: No impact on changing table structures • Rule 10:Integrity independence: Constraints • Rule 11:Distribution independence: • Rule 12: The nonsubversion rule:

  17. Normalization • 1st normal form • 2nd normal form • 3rd normal form • BCNF

  18. Creating Tables • It is called as Data Definition Language • Typical table should have primary key constraint (unique and not null) • Primary keys are supported by index • Relations are defined using foreign key constraint • Please review the ”CREATE TABLE” commands from load_schema.sql

  19. Loading data into tables • It is called as DML • Typically followed by commit or rollback • Commit will preserve the changes to data permanently • CRUD operations • Insert • Update • Delete

  20. Querying Data • Simple SQL – SELECT and FROM clause • Filtering data • Projection of data (selecting desired columns) • Aggregating data • Sorting data • Joining data from multiple tables

  21. Functions • Pre-defined • String manipulation • Date manipulation • Aggregate functions • Analytical or windowing functions • User-defined (out of scope)

  22. Challenges with RDBMS • While RDBMS is good for mission critical transaction based systems, it performs poorly in batch processing of data. • When ever CRUD operations happens • Performs constraint checks • Performs data type checks • Performs length and precision checks • Overhead for maintaining transactions (extensive logging – redo logs and undo in case of Oracle) • Overhead for maintaining consistency view of data between transactions • Expensive join operations (full table joins) • And more

  23. Solutions for RDBMS challenges • Datawarehousing approach can solve the challenges for RDBMS for heavy weight data processing • Facts • Dimensions • Measures • Data Mart • Denormalized data model (fewer tables for join) • Star Schema • Snow Flake Schema • Methodologies • Kimball Methodology • Inmon Methodology • Aggregated Tables – Facts at different granularity • In many cases constraints are informational, they are not strictly enforced

  24. NoSQL Databases • NoSQL – Not only SQL • Hbase, Cassandra, MongoDB • Tables are typically indexed and partitioned • Works very well for scalability of simple tables • Need not suit very well for transactional based systems • Example applications • Endorsement engine • Recommendation engine

More Related