English Info BSc: State Of The Art Database Systems

English Info BSc: State Of The Art Database Systems Presentation 2 Dr. Gábor Pauler, Associate Professor, PTE-TTK, Room F104, 6th Ifjusag str. Pecs, Hungary Mobile: 30/9015-488, Skype: gjpauler E-mail:pauler@t-online.hu Facebook and Open FTP sites of the course: http://www.facebook.com/groups/278606362188127/ ftp://gamma.ttk.pte.hu/pub/pauler/StateOfTheArtDataBase/

Content of the Presentation • Corporate Data Processing Systems • Why we need data model? • Object-hierarchy based data model • Relational data model • NoSQL data model • Graph data model • SAP HANA data model • Comparison of Spreadsheets, OOP, Relational, NoSQL, Graph, SAP HANA data models • Theory of Relational Database Processing • Normalization process • Normal format 1: Decomposition • Basic terms of Entities • Choosing identifier names • Relations analysis • Cardinality • Dependency • Entity-Property Rule • Entity-Decomposition Rule • Typical beginners errors • Literature

Corporate Data Processing Systems: Why we need data model? 1 Struct tInvoice InvoiceNum As String SellerNameAs String SellerAddressAs String SellerTaxRegAs String BuyerNameAs String BuyerAddressAs String InvoiceDate As Date Item1NameAs String Item1MeasUnitAs String Item1Quantity As Double Item1UnitPrice As Single Item1TaxPerc As Single Item1GrossVal As Double Item2NameAs String Item2MeasUnitAs String Item2Quantity As Double Item2UnitPrice As Single Item2TaxPerc As Single Item2GrossVal As Double … TotalVal As Double End Struct • (Database/Internal Table): can store (Records) fith fixed data structure numbered sequentially in hard drive/operative memory: • Records describe (Properties) of (Instances) of an (Enity) with the following data: • (Field name) • (Field Type): String, Date, Integer, etc. With a predetermined space consumption in bytes • (Min/Max/Default value) • (Required/Optional) filling •  Database table enables quick access of large amount of records, as starting position of a cecord can be computed in advance as RecordIndex×RecordLenght •  But one database table alone is a pretty inflexible storage tool: because of fixed data structure of records, just 1 table alone cannot store non-fixed lenght empirical data structures correcly • Example 2-1: Steve Droid was a trendy coder, who spent relational databases lessons in a pub nearby, saying „Yezzz, relational database tables are so inflexible, I would rather use NoSql, thats so trendy!” When he got his first job to create a small carpet store invoicing app, he put paper-based invoice fields in 1 table as it is (header+12 items), without any redesign. As most invoices had only 1 item, 11 were wasted. When there was 13th item, it was lost, or it was put in new invoice, wasting data recorder’s work

Corporate Data Processing Systems: Why we need data model? 2 Invoice SellerName R SellerAddr R SellerTaxID R BuyerName CRU BuyerAddr CRU InvoiceNum R SalesPerspn CRU GrossTotal= Sum(Item.GrossValue) Paid CRUD InvoiceDate InvoiceTime CusttName DateOfBirtth Child1Name Child1DateOfBrt Child8Name Child8DateOfBrt Kovács Jánosné 1958.10.23 Kovács István 1987.01.04 Nagy Józsefné 1968.03.12 O. Dzsenifer 1987.11.04 O. Brian 2000.09.11 O. Leona 2007.01.16 O. Ámor 2008.02.01 Kiss Károlyné 1972.11.02 Kiss Ili 2002.10.20 Kiss Vili 2004.07.08 Item Descr R MeasUnit R BarCode CR TaxCode R Quantity CRUD UnitPrice R VAT% R GrossValue= (UnitPrice* Quantity*(1+ VAT%/100)) Nagy Emília 1976.10.05 • 1-table systems without correct database design recreate errors of paper-based systems: space/ data recording wasting and data loss • Example 2-2: I want to store data of mothers with data of their children in one table. But different mothers can have very different number of children, therefore let’s say, I leave space for 8 children: • An average mother in first world has 0.4 kids „becuse of uncertainties of the market in these days” 7.6 space wasted: █ • An average mother in third world has 8-12 kids, because there is no money for contraceptives  4 of them will die from starving, because our food aid system cannot store their data… • Moreover hard drive and operative memory are not made of rubber! This can be really disturbing, if sequence of records has a a special meaning and we would like to keep them ordered: •  Physically you cannot insert a record in the middle of the table, you can add it only to the end. Inserting will require some tricks. •  If you delete a record form the table, its space consumption will stay there as useless, inactive area. Reusing it require tricks again. • Empirical data structures are full with 1:many ( ) or many: many( )(Nested relationships): • 1 buyer can have more addresses (in time) and more buyers can reside in 1 address • Emprical data structures can contain each other nested: 1 invoice can have more items, and items have many properties themselves • Therefore they cannot be stored in single table’s fixed record structure effectively!

Corporate Data Processing Systems: Databases, Object-Hierarchy Data Model 1 • (Database, DB): integrateddatastructure, whichbaseson a (Data Model), and canstorethefollowing: • (Properties) (Physically: Data fields) of numerous (Instances) (Physically: Records) of different (Objects) (Physically: Tables) • (Meta-Data) describingstructure of objects • (Methods) of processingdata, whicharecapableof: • Handling(Non-fixed LenghtEmpirical Data Structures, EDS) (e.g. 1mothercanhavemorechildren) without(Data Loss) and without(Redundancy) instoragespaceconsumptionanddatarecording • Handling(DisambigousReferences) (e.g. father and sonwithidenticalnameresidingonthesameaddressshouldnot be mixed) • Databasesareprocessedby(Database Management System, DBMS): Create, Read, Update, Delete, Archive, Nullify • Thereareseveraldatamodels (OOP, Relational, Lean), let’s seethem: • (Object/Instance/Property/Value-HierachyBasedDatabase Management, OBDM): • Inthe(4th GenerationLanguages, 4GL) (e.g. C#, Java, Visual Basic, Delphi) (Object Oriented Programming, OOP) is invented, where an (Object/Class) canhaveseveral (Instances) describedwiththesame (Properties) whichcantakeseveral (Values) form a predefined (Domain), whichareprocessedby (Methods). Many (Descendant) objectscan (Inherit) properties and methodsfrom1 (Ancestor) object, containing (Collection) of instancesofancestor (Nested): thiscanhandle1:many /nestedrelationships of EDS

Corporate Data Processing Systems: Object-Hierarchy Data Model 2 Class Buyer Inherits Address ‘Data fields FirstName As String Lastname As String CellPhone As String E-mail As String ‘Embedded classes fAddress As Address End Class ‘Global list Dim Buyers As New _ List Of (Buyer)() Class Country ‘Data fields CntName As String End Class ‘Global list Dim Countries As New _ List Of (County)() • We show a part of our car-pet store invoicing system in Visual Basic OOP: • EDS is decomposed into classes which can be des-cribed with fixed number of properties (e.g. Buyer, Seller, SalesPers) • If 1 instance of class A can have more instances from class B, B class InheritsA class (denoted by BA) and B contains properties of A nested • Instancess for each classes are generated by collection List: it enables creating new instances (.Add with method),deleting (.Remove method), retrieving data by (.Name property) or index number of instance • OHBD data model is mainly used in (First Person Sho-oter, FPS) games and engi-neering process control ap-plications, as  it can query more fast, but  less flex-ibly as relational model, because it is less effective modeling numerous m:m connections among objects Class StreetType ‘Data fields TypeName As String End Class ‘Global list Dim StreetTypes As New _ List Of (StreetType)() Class Zip Inherits Country ‘Data fields City As String ‘Embedded classes fCountry As Country End Class ‘Global list Dim Zips As New _ List Of (Zip)() Class SalesPers Inherits Address ‘Data fields FirstName As String Lastname As String CellPhone As String E-mail As String ‘Embedded classes fAddress As Address End Class ‘Global list Dim SalesPersons As New _ List Of (SalesPers)() Class Address Inherits StreetType,Zip ‘Data fields Door As String Door As String Floor As String HouseNum As String LinePhone As String Fax As String ‘Embedded classes fStreetType As StreetType fZip As Zip End Class ‘Global list Dim Addresses As New _ List Of (Address)() Class Seller Inherits Address,LegalFormat ‘Data fields FirstName As String Lastname As String CellPhone As String E-mail As String ‘Embedded classes fAddress As Address fLegalFormat As LegalFormat End Class ‘Global list Dim Sellers As New _ List Of (Seller)() Class LegalFormat ‘Data fields FormatName As String End Class ‘Global list Dim LegalFormats As New _ List Of (LegalFormat)()

Corporate Data Processing Systems: Relational Data Model 1 • In corporate environment, OOP data model was never really successful, because it is fast only at 1 pre-designed query path (e.g. World Of Tanks: How many tanks I have on the field?). On ad-hoc query paths, not covered by „rules of the game” (e.g. What is the average fuel stock in my tanks?), performance would drop dramatically. In business, most of the time there are no „rules of the game”, ad-hocqueriesaredominatingthework. • (Relational Database Management, RDM) invented by E.F. Codd, founder of Oracle Inc. In 1971, is somewhat slower than OPP, but gives balanced performance in any ad-hoc query: • It (Decomposes) EDSintoseparate (Entities) untiltheycan be describedwith fixed datastructures, sotheyarestoredphysicallyin (DatabaseTables) withhighretrivespeed. • E.g. Inouronetablecarpetstoreinvoicingsystem, itwas a problemthatitcouldstoremax. 12 itemsfor an invoice, whileintherealitytherecan be more • But, let’s decomposeItem-relateddatain a separatetable: • To preservedata of theoriginal non-fixed lengthEmpirical Data Structures, wecreate(Relations): it is a referentialconnection, where(Primary Key, PK)field, identifyingrecordsuniquelyin1-side table (e.g. Invoices),is referenced (see: ) by(Foreign Key, FK) field of themany-sidetable (e.g. Items): • E.g. AtallItems, it is compulsory to give, towhichInvoicetheItembelongs • Thiscan be storedin fix amount of space, as 1 Itemcanbelongonly1Invoice • Inthemeantime, 1 Invoicecan be referencedbyunlimitednumber of Items, whiledata of invoice has to be recordedonlyonce! • Withthistrick, RDM largelyresolvesthe old dilemma betweenspeed and flexi-bility of dataprocessing, and theguybuilt a world empire fromthisinnovation!

Corporate Data Processing Systems: Relational Data Model 2 • Relational Data Model is built from Empirical Data Structures collected by developers earlier in (Normalization) process, whose 5-6 steps are called (Normal Formats, NF). • Evaluation of RDM: •  We can use high storage capacity, fast retrieving memory/database tables (since the advent of SSD hard drives, time loss of physical hard disc/ head movement is eliminated) •  Data loss is eliminated: it can store data with highly variable lenght •  Redundancy is eliminated: all data is stored only once, and has to be recorded only once •  Ambigous references are elminated because of using relation based on key fields •  Normalization itself does not fully maximize retrieve speed of database, as its main goal is minimal specific space/work consumption •  But, retrieve speed of database can be maximized by further (Denormalization) techniques, when we intentionally hurt normalization rules, to get high gain in retrive speed at the price of modest increase in space consumption • (Relational Database System, RDS) is handled by (Relational Database Management System, RDMS) maintaining: • (Database Tables):stored in Tablespace files, • (Relations):between them, • (SQL Queries): program code which defines the requested result on RDS structure we want to get, and the algorithm is automatically determined • (Forms): GUI of RDS, which reaches database tables through queries

 In normalization, original EDS is decomposed into large number of database tables: it will completely different from what you saw in paper, and hard to overview by untrained humans, but results in optimal space/work consumption!  Even developers need a diagram called (Entity Relation-ship Diagram, ERD) to effectively overview normalized struc-ture. Simple untrained guys have no chance to understand it.  While, on GUI Forms, simple users still would like to see nearly the same as on paper, just working with much higher speed and capacity. (View Table Quieries) will bridge the structural difference between normalized storage and GUI structure, transmitting data forth and back. So, GUI can have similar look than original EDS on paper, but normalized structure behind that is completely different! Corporate Data Processing Systems: Relational Data Model 3 CompTypes CompType Name • Buyer • BuyerID • LastName • FirstName • Door • Floor • Building • HouseNum • StreetName • StreetType • ZIP • Mobile • E-mail • Sellers • SellerID • CompName • CompType • SellerTaxReg • Door • Floor • Building • HouseNum • StreetName • StreetType • ZIP • Tel • Mobile • E-mail • URL StreetTypes StreetTyp Name ZIPs ZIP City Country Invoices InvoiceID InvoiceNum ItemCount NetTotal GrossTotal VATTotal Paid InvoiceDate InvoiceTime SellerID BuyerID SalesPersID Modifier Modified Status SELECT Invoices.InvoiceNum, Invoices.InvoiceDate, Invoices.GrossTotal, Invoices.Paid, Invoices.SellerID, Invoices.BuyerID, Sellers.Name, Sellers.TaxRegNum, SalesPersons.Name, Adresses.HouseNum, Adresses.Floor, Adresses.Flat, Adresses.StreetName, Adresses.ZIP, ZIPs.City, Buyers.Name, Buyers.HouseNum, Buyers.Floor, Buyers.Flat, Buyers.StreetName, Buyers.ZIP, ZIPs_1.City FROM ((ZIPs AS ZIPs_1 INNER JOINBuyers ONZIPs_1.ZIP = Buyers.ZIP) INNER JOIN(((ZIPsINNER JOINCompanies ONZIPs.ZIP = Companies.ZIP) INNER JOINSellersONCompanies.TaxRegNum = Sellers.TaxRegNum) INNER JOINInvoices ONSellers.SellerID = Invoices.SellerID) ONBuyers.BuyerID = Invoices.BuyerID); • SalesPersons • SalesPersID • LastName • FirstName • Door • Floor • Building • HouseNum • StreetName • StreetType • ZIP • Tel • Mobile • E-mail Countries Country Name VATCats VATCat VAT% Item ItemID Quantity NetValue GrValue InvoiceID BarCode Modifier Modified Status TaxCodes TaxCode Descr VATCat Products BarCode Descr UnitPrice TaxCode MeasUnit MeasUnits MeasUnit Name

Corporate Data Processing Systems: NoSQL Data Model 1 •  Relational data model has high performance, but it is hard to learn: design requires professionalism, abstract thinking, intuition and practice, and still cannot be automated  if you make bad ERD, the whole database is bad! •  Relational databases require expensive high capacity servers, SQL processing is very hard to parellelize to run on many smaller cloud machines •  Not all applications require total compliance to ACID criteria while the most of resource consumption of relational model goes to comply them: • Atomicity:It can(Undo) all data manipulations based on (Transaction diary) • Consistency:It checks that the users cannot enter inconsistent/ corrupted data (e.g. matching data types, primary- and foreign keys) • Isolation:If 2 users are working on the same table, same time, they can see each others data only if that are (Committed): recorded and checked • Durability: Commited data can be stored only permanently on Solid State Drive (SSD), to prevent its loss in power failure/OS freezing • (NoSQL data models) intensionally give up some ACID requirement, which are not important in given application (e.g. In air traffic control, database should show ALL aircrafts’ real position, otherwise kidney of a 85-yer old Pakistani grandma will hit your kidney at 1000 knots relative speed, but using PornHub, you will survive, if you miss some pictures)

Corporate Data Processing Systems: NoSQL Data Model 2 •  They are not storing data in database tables, therefore broblems of designing entities are avoided. They can store non-fixed structural lenght data directly in the following tools: • (Key:Value) pairs stored in object collection in RAM/SSD (e.g. in web programming the Session object storing client-server communication), the key is not required to be unique, but should not contain too many repetitions. In more advanced version, collection is ordered by key values • (Document): the same as previous, but key here is always unique ID, and the values are not only single data fields, but can be mini-hierarchies with changing structure object from object: • Can be hierarchically built script language: XML, JSON, YAML • Or it can be the script’s binary space consumption in memory: BSON • Text/Image/Movie/Sound files in hierarchic directory structure stored on SSD • Tagged text/image/movie files (eg. Instagram post, Flickr image) •  At creating/modifying/deleting elements, it checks only (Eventual consistency) with the directly linked elements, but not with the whole database, therefore it can handle big data (eg. Google search) easier in distributed cloud systems, using cheaper hardware •  But, because of limited consistency it can rarely loose data entered, or it sometimes does not show the latest version of data •  There is no perfect Undo transaction processing for most of the data manipulations •  There is no standardized query language (like SQL at relational databases), it uses different Java libraries at different database software. Source: https://www.w3schools.com/xml/simple.xml

Corporate Data Processing Systems: Graph-based Data Model • If there is a big uncertainty in the developement of a new applications, event the client and its experts may not be able to clarify cardinality (1:1, 1:more, more:more) of participating objects, making hard to create relational data model. • (Graph database) can create/read/modify/delete large number (m) of (Nodes) with their maximum number (m-1)×m of direct connections called (Edges), and various number of (Properties) attached to nodes and edges: • Graph can be based on relational database • Or on NoSQL Key:Value database •  You do not have to know how the graph can handle m:m connections, you have to decide about only what properties are attached to different nodes and edges, and you can query them without making normalization! •  Graph databases have an advantages over SQL when query should connect multiple levels of objects based on matching multiple properties in the same time. (eg. Getting free beer: Who are my friends and friends of theirs, who reside close to my position in a given time AND prefer hand-made beer AND have higher income than mine?) •  Importing/modifying/deleting large number of data automatically, it is slower than SQL, as for m data, it has to process huge number (m-1)×m of edges •  If Graph is NoSQL-based, it does not comply with ACID criteria, if it is relational-based, then it complies. But the latter case requires exactly as expensive hardware as the relational data model •  It has no standard query language yet, most of Graph databases use some Java libraries for data handling Source: https://www.imagenesmy.com/imagenes/email-data-model-97.html

Corporate Data Processing Systems: SAP HANA Data model • HANA data model issued with newest S/4 version of SAP ERP system developes the relational model further, based on latest advancement in hardware that I/O speed of SSD drives are much higher than disk-based HDDs, being closer to I/O speeds of RAM. •  It preserves ACID-compliance and fast insert/modify/ delete advantages of SQL as data in one of their mirrored storages are stored in database tables (tables offixed structure records stored sequentially in RAM/SSD, connected by relations between primary- and foreign keys) •  At multi-level queries matching multiple properties (eg. On-Line Analytical Processing, OLAP: used in top managerial reporting with a GUI, where structure /aggregation/data of a report can be manipulated graphically in max 5 secods refresh time) it can reach the speed of Graph databases, as there is a second mirror of all data stored by fields/colums sequentially (eg. If Turnover field should be aggregated, it does not reads 100M records to pick Turnover from each, but directly sums the Turnover column stored in the second mirror) It can reach 10×..100× speed increase compared to relational model-based OLAP. •  Query language is still the standard SQL and OLAP, with some extensions •  Synchronization of the 2 mirrors (Row store, Column store) is solved fully automatically and very fast with new type of RAM hardware. •  Therefore, hardware cost is brutal, and it can be purchased only from SAP

Comparison of Spreadsheets, OPP, Relational, NoSQL, Graph, HANA Data Models 1 • Advantages are denoted with green, disadvantages with red,medium perform.orange

There are passageways across different data models to mitigate their comparative disadvantages with combined use in one application: Connections among Spreadsheets, Relational/HANA, OOP/NoSQL/Graph Data Models ODBC Server Data linking high Flexi- bility OOP, Hierarchic database processing OOP/NoSQL /Graph Relational /HANA Macros Flexi- bility Specific performance SQL commands in cell function Pivot table reports Macros Worksheet object classes Flexi- bility Spreadsheets low Complexity of programming high low

Content of the Presentation • Corporate Data Processing Systems • Why we need data model? • Object-hierarchy based data model • Relational data model • NoSQL data model • Graph data model • SAP HANA data model • Comparison of Spreadsheets, OOP, Relational, NoSQL, Graph, SAP HANA data models • Theory of Relational Database Processing • Normalization process • Normal format 1: Decomposition • Basic terms of Entities • Choosing identifier names • Relations analysis • Cardinality • Dependency • Entity-Property Rule • Entity-Decomposition Rule • Typical beginners errors • Literature

Relational database: Normalization: Normal Format 1: Entity • Normal Format 1: Empirical Data Structures (EDS) shold be (Decomposed) into several (Entities): it can be any object, which: • Has large number of (Occourences) (e.g. Invoice, Product), • Occourences can be described with the same (Properties) (eg. Invoice: InvoiceNum, InvoiceDate, GrossTotal, Seller, Product: BarCode, Name etc.), therefore entity can be stored in fixed length data structure • Rules of creating properties: (pl. Végösszeg) • They have (Logical Name):it is unique inside entity:(eg.GrossTotal) • They have (Logical data type): entities are system-independent plans, so we choose here: integer/real/text/date/image/movie, etc. But we do not know the exact physical space consumption yet.(eg.real num, 2 decimals) • TYPICAL BEGINNERS ERROR! Never store number or date as text, or date as integer number! Usage of incorrect type will waste resources and prevents using date-computing functions of SQL language! • They have (Default value): this will be the value of the property before data entry (eg. 0.00). If this is not set, the default value of a property will be missing value called NULL: don’t confuse it with(0), which is a number! • TYPICAL BEGINNERS ERROR! Don’t confuse NULL with zero (0). While Excel auto converts empty cell to zero, in relational databases, missing value NULL is strictly separated: • NULL is never converted to zero! Any type has missing value NULL! • You cannot make any computation with NULL! Results NULL! • NULL can’t be compared to anything: NULL=NULL condition is false! • They have min..max (Range) (eg. -999,999,999.99..999,999,999.99) • They can be (Reqired) to fill: cannot contain NULL (but can be zero!) • They can be (Auto-filled) with formula: (eg. =SUM(Item.GrossValue))

Relational database: Normalization: Normal Format 1: Identifier names • Entities are sytem-independent plans for database tables, so they have (Logical Names): • (Uniquely named) across the whole database plan • Entity names are worded as non-plural in any language (eg. Invoice entity), to differentiate the from completed: • (Physical database table): • It is working in a given database in given operation system- and hardware environment • They have (Physical Name) always worded as plural (eg. Invoices table) • A fizikai tábla-mező nevek viszont megegyeznek a tulajdonságok logikai neveivel • WHY THIS RULE IS NECCESSARY!? In a big database plan (eg. Data warehouse or SAP ERP system) there can be several 1000 tables, developed by 10-50 developers parallely. It is important to clarify, which tables are already completed, and which entities are under design process! • Name of Entites/Tables are separated from Properties/Fields by character „.” in most programming languages (eg. Invoice.InvoiceDate), except SAP-ABAP, where „~” • In an Entity/Table, Properties/Fields should be uniquely named (eg. In Entity Invoice we cannot call issue- and payment dates the same way, so: .InvoiceDate, .PaymentDate) • Across more Entities/Tables, Property/Field names can be repeated. It is highly recommended to name similar fields consistently (eg. If there are deadline of payment fields in more tables, they all should be calledd .PaymentDeadline instead of several named versions) • Rules of selecting Entity/Table and Property/Field names: • They should have 8±3 characters, do not start them with numbers, • They should not contain special- and accented characters (eg. MS SQL, Access can handle that, but importing into Oracle MySQL, DBase they can’t remain!), • Choose informative English language abbreviations (eg. P:it is too short, you will forget meaning, PartialInstallmnt:too long, you will mistype it, PartInst is optimal) • For compund names, camelcase writing is recommended for better readability (eg. PartInst) otherwise in SQL Upper- and lower case has the same meaning.

Relational database: Normalization: Normal Format 1: Relations Analysis 1 • The toolfordecompositionintoEntities is (RelationalAnalysis): thisconsist of textualstatements, whosepartsaredenotedindifferentcolorsfordidacticreasons (seebelow). Onestatementalwaysexamines: • Two(Entities’)or(Properties’) (letsdenotethem here withAand B) (Relation) , • FromTwosides (Arelated to B +Brelated to A), • ByTwofeatures, each of themcanhavethreepossibleversions: • (Cardinality):Forhowmanyoccourences of Abelonghowmanyoccourences of B? • A:B=1:1: 1occourence of Acanbelong to 1occourence of B, AND 1occourence of Bcanbelong to 1occourence of A: mutuallyexclusiverelation • eg. 1BankAccountdoesbelongto 1IBANCode, AND 1IBANCodedoesbelongto 1BankAccount • A:B=1:many: 1occourenceof Acanbelong to manyoccourencesof B, BUT 1occourence of Bcanbelong to 1occourence of A: • eg. 1Ownercanbelongto manyBankAccount, BUT 1BankAccountdoesbelongto1Owner • A:B=many:many: 1occourence of Acanbelong to manyoccourences of B, AND 1occourence of Bcanbelong to manyoccourences of A • eg. 1Bank canbelongto manyCustomer, AND 1Customercanbelongto manyBank • (Dependency): Is A(Requiredprecondition) of B, orB is connected(Independently)? • A:B=Dependent:Dependent:A doesbelongto B, ANDB doesbelongto A: mutualdependency (eg. There is no BankAccountwithoutIBANCode and there is no IBANCodewithoutBankAccount) • A:B=Independent:Dependent: A canbelongto B, BUT B doesbelongto A: one-sideddependency: (eg. There is no BankAccountwithoutOwner, butthere is OwnerwithoutBankAccount) • A:B=Independent:Independent: A canbelongto B, ANDB canbelongto A: mutualindependence: (eg. Both Bank and Customercanexistwithouteachother)

Relational database: Normalization: Normal Format 1: Relational Analysis 2 • Theoretically, 3×3=9 combinations of cardinality and dependency can exist, but some of them are more frequent in the practice than others: • 90% of relations are (1:many,independent:dependent): eg. 1Invoice can belong to manyItem, BUT1Item does belong to 1Invoice • 5-6% of relations are (many:many, independent:independent): eg. 1Invoice can listmanyProduct, AND1Product(type) canbe listed at manyInvoice • 2-3% of relations are (1:many, independent:independent): eg : 1Person canborrow manyBook, BUT1Book canbe borrowed by 1Person • All other types are extremly rare • How we can use relational analysis decomposing EDS into Entities? • (Entity-property rule): An Entity always related 1:1, dependent:dependent to its Properties: eg. 1Invoice does have1InvoiceNumber, AND 1InvoiceNumber does belongto1Invoice • (Entity-decomposition rule): An Entity cannot contain nested 1:many, many:manyrelations or nested sub-Entities: they should be put in separate Entity: • eg. 1Invoice can havemanyItem, BUT1Item does belong to1Invoice1:many,independent:dependent relation: Invoice and Item should be put 2 separate Entities/Tables, and Item existentially depends from Invoice • EZ ARE YOU SERIOUS THAT I HAVE TO WRITE IT WITH COLORED PENCILS LIKE IN THE KINDERGARTEN?! EXPERTS SHOULD DO IT MORE COOL! • Relational analysis is a boring theoretical thing requiring abstraction ability, but once you capture its logic, it is not so difficult • Creating a working practical relational database, all further steps can be automated, but this one is not: if you have the correct plan, in most design software you can generate an empty working database with 5 clicks. But if the plan is messed up, you can have the best hardware, the best database engine, the best programmers, it will fail! (Historic FUBAR: FBI’s Investigation Database, 1969: They built a huge hierarchic model for USD 100M, but criminal investigation is full with many:many relations, which hierarchic model could not cope!!!) • Experts will use design templates, but if there is a new thing, they play it by head!

Relational database: Normalization: Normal Format 1: Relational Analysis 3 • TYPICAL BEGINNERS ERROR! Beginners frequently underestimate cardinality of relations. Eg. Nooshie Blone believed that her relationship with Steve Droid will be 1:1 but soon discovered it is 1:many, and she reacted many:many... If you see relation of 2 entites 1:1 most probably you made a mistake, as it is very rare in reality. Even if something looks 1:many at the first time: • 1Personcan borrow manyBook, BUT1Book canbeborrowed by 1Person1:many,independent:independent relation • You should always think generalized, examining exceptional cases also, moreover try to think at long term (as databases do not work for just one second), where relatins can turn out to be many:many: • 1Personcan borrow manyBook, AND1Book can be borrowedbymanyPerson (after each other). many:many,independent:independent relation! • TYPICAL BEGINNERS ERROR! Do not mix Instances of an object, e.g… • 1Invoice cancontainmanyProductInstance, BUT1ProductInstance can belong to1Invoice1:many,independent:independentrelation, as you cannot sell a Nokia 8615 cell phone with S/N123456789 as new twice (except when cheating) • …with type of an object: the latter can be related with higher cardinality, and is more frequently used in relational databases: • 1Invoice can listmanyProduct, AND1Product(type)can be listed at manyInvoice. many:many,independent:independentrelation: Cell phone type Nokia 8615 can be sold several times at several invoices • TYPICAL BEGINNERS ERROR! Do not mix Entity with its Occourences (Table with its Records): Instead of creatingSales2018January, Sales2018February, etc. Tables storing monthly sales data in the same fields, you will create singleSales table with Month, GrossTotal fields: it can store max. 100M records in Oracle/MS SQL, so you have plenty of space to tell what you did in Month=„January”, Month=„February”, etc.

Literature Objecthierarchydatamodel: • https://en.wikipedia.org/wiki/Object_database • http://en.wikipedia.org/wiki/Hierarchical_database_model • http://en.wikipedia.org/wiki/Nested_set_model Relationaldatamodel, normalization: • http://en.wikipedia.org/wiki/Edgar_F._Codd • http://www.itworld.com/nl/db_mgr/05072001/ • http://support.microsoft.com/kb/283878/hu NoSQLdatamodel: • https://en.wikipedia.org/wiki/NoSQL Graph-baseddatamodel: • https://en.wikipedia.org/wiki/Graph_database SAP HANA datamodel: • https://help.sap.com/http.svc/rc/fb8f7a9f7860468b84a07eab0a7d0a98/2.0.01/en-US/SAP_HANA_Modeling_Guide_for_SAP_HANA_Studio_en.pdf

English Info BSc: State Of The Art Database Systems

English Info BSc: State Of The Art Database Systems

Presentation Transcript

Logical Database Design

CS186: Introduction to Database Systems

Introduction to Database Systems

CS 328 Database Systems Chapter 8: SQL

The State of the Art in Distributed Query Processing

CMPT 354 Database Systems I

PMIT-6102 Advanced Database Systems

Using Database Management Systems

Database Systems {week 03a}

C20.0046: Database Management Systems Lecture #9

DATABASE MANAGEMENT SYSTEMS -- (ITCS 385)

PART I : Databases and Database Management Systems

CSE 480: Database Systems

Parallel Database Systems The Future of High Performance Database Systems

Database Management Systems Chapter 6 The Database Language SQL (II)

603 Database Systems

Database Management Systems

DATABASE SYSTEMS

Database

DATABASE Management Systems

Chapter 37 Java Database Programming

Introduction to Database Systems