280 likes | 369 Vues
One thing in one place. Data, Schemas and Applications Week 8. Data Normalisation. Formal technique to reduce complex relations to a set of simpler relations which are better structured and reduce redundancy
E N D
One thing in one place Data, Schemas and Applications Week 8
Data Normalisation • Formal technique to reduce complex relations to a set of simpler relations which are better structured and reduce redundancy • We shall look at it informally on this module. On a full database module this would be covered in more theoretical depth.
Potential anomalies • UPDATE the price per session of facility 1 • now different rates for squash courts - suspect business rule is same rate for all facilities for a ‘sport’ • ADD a new tennis court • must also ensure the same rate is used as for other tennis courts • DELETE facility 11 • lose the rate per session for skittles
Two better tables FacilityType Facility FtypeNo Name Price FacilityNo FaciltyUnit
Normalisation • Must ensure no data has been lost in this process. • Can recover the Original data using join: select * from facility natural join facilityType • Informally we feel that the second solution is ‘better’ than the first. • Less redundancy • Update anomalies avoided • Simpler tables reusable • Queries are slower because a join needed • original table: select price from facility where facilityNo=6 • normalised tables: select price from facility natural join facilityType where faciltyNo = 6
Exercise • Normalise this data on outstanding video rentals • Describe your result as an ER model (relationships not foreign keys) Source: www.gc.maricopa.edu/business/sylvester/cis164/norm-rev.htm
1st Normal form - 1NF • the data must be in the form of valid relations - i.e. with uniquely named columns, each column containing only atomic (indivisible) values • each row unique - identify the primary key
2nd Normal Form • Non-key fields are fully dependent on the primary key • ‘Dependent’ • Column A is dependent on Column B if, in the data, knowing the value of B will determine the value of A • E.g. E#, Age • Age is dependent on E# • E# is not dependent on Age ( several people with the same age , different employee numbers)
3rd Normal Form 3NF • A relation R is in 3NF if and only if • the non-key fields are fully dependent on the primary key i.e is 2NF and • the non-key fields are mutually independent • e.g R(A,B, C) • then • B dependent on A • C dependent on A • B and C independent
Mnemonic • “the key, the whole key and nothing but the key” • 2NF • “the key” : non-key fields depend on the key • “the whole key”: non-key fields depend fully on the key • 3NF • “nothing but the key”: non-key fields are not dependent on each other
Summary : Data Normalisation • Place common relations in separate tables and join them when required • Normalisation is bottom-up – given a mess of data in a table (a spreadsheet for example), how to restructure it to reduce redundancy • More names required – name for each relation at least – more concepts but more understandable • Normalised tables may be reusable • Reduced performance due to extra joins – may have to de-normalise to get better performance • Improved performance due to smaller database • Normalisation is a bottom-up process – starting from some mess – like a spreadsheet • Top-down models are usually normalised anyway
Code ‘Normalisation’ To factor (maths) – to divide into the basic units 68 = 2*2*17 • For very similar reasons, we want to write our code (PHP, HTML, Javascript..) so it is ‘normalised’ too – common, recurrent code in factored into one place. • To avoid update anomalies – update in one place • To increase readability – less code to read • To create reusable components • But may reduce performance – but who cares – maybe later if you need to optimize-
Recurrent Values • If the same value is used in several places, assign it to a variable (constant) and use the variable name where it is needed. • If this common value has to be changed, it can be changed in one place only. • The variable name provides a understandable explanation of what this value is for.
Silly example: colorheads.php <?php print "<h1 style='color:4AFF2F;'>This one is OK</h1>"; print "<h1 style='color:FF1052;'>This one is BAD</h1>"; ?> <p> here is some text </p> <?php print "<h1 style='color:FF1052;'>This one is BAD too</h1>"; print "<h1 style='color:4AFF2F;'>This one is OK too</h1>"; ?>
Revision 1 : colorheads2.php <?php $goodColour = "4AFF2F"; $badColour = "FF1052"; print "<h1 style='color:$goodColour;'>This one is OK</h1>"; print "<h1 style='color:$badColour;'>This one is BAD</h1>"; ?> <p> here is some text </p> <?php print "<h1 style='color:$badColour;'>This one is BAD too</h1>"; print "<h1 style='color:$goodColour;'>This one is OK too</h1>"; ?>
Revision 2 : colorheads3.php <html> <style> .good {color:4AFF2F;} .bad {color:FF1052;} </style> </head> <body> <h1 class="good">This one is OK</h1> <h1 class="bad">This one is BAD</h1> <p> here is some text </p> <h1 class="bad">This one is BAD</h1> <h1 class="good">This one is OK</h1> </body> </html>
Common code • A common piece of code can be put in on place and linked into the programme • Suppose you have an HTML footer to go on every page (it might include to call Google Analytics for example) • We want to put this in one file and then include it in the scripts for each of the pages
Footer code (footer.php) <hr/> <p style="font-size: 10pt; color: blue"> This page is generated on <?php print date("l, d/m/Y") ?> at <?php print date("H:i") ?> from a script written by Chris Wallace. </p> php function to include php script from another file Sample page (page1.php) <html> <head> <title>First Page</title> </head> <body> <p style="color:silver; font-size:15pt;"> This is the main body of page 1 which is linked to <a href="page2.php">page 2</a> </p> <?php include("footer.php") ?> </body> </html>
Modularisation • Modularisation as a common pattern • In PHP there are several slightly different ways to include code: • include(), include-once() • require(), require-once() • Since PHP is a scripting language, code can’t be pre-compiled as in Java.
Functions • Common code but for a few variations • Variations can be expressed as parameters • Create a function • Call the function where required
Two parameters Function declaration and use makefooter.php <?php // define the function function footer($author, $size) { return "<hr/><p style='font-size: {$size}pt; color: blue'> This page is generated on ". date("l, d/m/Y")." at ".date("H:i"). " from a script written by $author.</p>"; } //call the function print footer("Chris Wallace", 20); //call with different parameters print footer("Mark Butler", 30); ?> Can’t write $sizept { } means evaluate expression . Dot for concatenate strings
Functions and Modules • Common to combine these two ideas • Place the function definitions in a separate file • Include file in every program which needs these functions • Call the function wherever required. • E.g. page3.php
Function library functs.php <?php // generate a footer function footer($author, $size) { return "<hr/><p style='font-size: {$size}pt; color: blue'> This page is generated on ". date("l, d/m/Y")." at ".date("H:i"). " from a script written by $author.</p>"; } ?> Use of function <html> <head> <title>Third Page</title> </head> <?php include("functs.php") ?> <body> <p style="color:silver; font-size:15pt;"> This is the main body of page 3.</a> </p> <?php print footer("Chris Wallace", 25) ?> </body> </html>
Code normalisation : Summary • Breaking a programming task cleanly into into simpler, re-useable units • The sign of a good programmer • Good structure is arrived at iteratively • You notice common parts as the code is written • You re-factor the code to put the common parts in one place • Avoid merely copying code because this creates update anomolies – an error has to be corrected in all copies
One thing in one place • An idea which is common to both data and code normalisation • In database it means splitting big data sets into smaller, independent and reusable ones with no redundancy • In programming it means splitting a large amount of code into smaller reusable items – variables, modules, functions and classes • Where else do we see this principle? • A class or type separates the fixed parts of a set of things (every employee has a name) from the variable (Empno ‘1111’ has name ‘Fred’) • In learning, it is about identifying common themes or patterns, separating the general from the particular : • String concept implemented differently in each language • Code and Annotations in one script • Several Languages in one script
Tradeoffs and judgement • Normalisation requires judgement • Are these two things really the same? • Will they always be the same? • Normalisation incurs a performance penalty – because the separate bits have to be put back together in execution – joins in a database, calls in code • Data and code design involves trade-offs
Hints for debugging PHP • Use the isa.cems.uwe.ac.uk host to show errors • Add additional print statements • Comment out in case needed in the future • Use print_r($array) to print a complete array (such as a row returned from the database)