1 / 48

eCommerce Technology 20-751 Databases

eCommerce Technology 20-751 Databases. Concepts. Relational model SQL DB construction Normalization ER diagrams Transactions Web support. Critical Role of Data. Without data, an organization cannot function especially in eCommerce Initially, data was prepared for specific applications

viveka
Télécharger la présentation

eCommerce Technology 20-751 Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eCommerce Technology20-751Databases

  2. Concepts • Relational model • SQL • DB construction • Normalization • ER diagrams • Transactions • Web support

  3. Critical Role of Data • Without data, an organization cannot function • especially in eCommerce • Initially, data was prepared for specific applications • payroll data for the payroll system • parts lists for the bill of materials system • sales data for statistical analysis • By 1970, clear that data had common properties • Data for many applications could be stored together in an organized way • database instead of separate collections

  4. What is a Database? • No formal definition • A collection of related data allowing: • insert (add new data) • delete (delete existing data) • update (change existing data = delete + insert) • query (retrieve all data having a certain property) • What does “related” mean?

  5. Database Management System • Based on a data model, e.g. relational, object, hierarchical • Has data definition language (DDL) to identify data • Has data manipulation language (DML) for queries and updates • Separates structure of data from DB implementation • Enforces data structure and content rules • Handles transactions, concurrent operations • Allows backup and recovery from errors • Connects to other software

  6. The Relational Model • A set is a collection of unique items { CS, HCII, ISRI, RI, LTI, CALD } Divisions of SCS { CS, HCII, CS, HCII, RI, CS } NOT A SET (repeated elements) • A relation on two sets A, B is a set of pairs of elements, one from A and one from B A = { 46-870, 20-751, 46-749, 20-753, 20-770 } B = { GSIA, SCS } R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) } • Relations can be defined on any number of sets

  7. The Relational Model of Data A = { 46-870, 20-751, 46-749, 20-753, 20-770 } B = { GSIA, SCS } R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) } 20-770 46-870 46-749 20-753 20-751 GSIA SCS This is the graph of the relation R

  8. The Relational Model of Data A = { 46-870, 20-751, 46-749, 20-753, 20-770 } B = { GSIA, SCS } R = { (46-870, GSIA), (20-751, SCS), (20-753, SCS), (20-770, GSIA), (46-749, GSIA) } CONTAINS ONLY COURSE NUMBERS CONTAINS ONLY SCHOOL NAMES This is a table of the relation R

  9. The Relational Model of Data Relations are not necessarily binary. May involve many sets: • Each row is a 7-tuple. Relation on 7 sets. • No implied ordering of either rows or columns. Sorting is irrelevant • Note: bad table design since “DEPT” is an attribute of “FACULTY”, not “COURSE”

  10. Tables • A relation can be represented as a table • One row for each tuple in the relation • Easier to draw than a graph • Table has implicit order (of rows and columns) • But: a relation has no ordering, either of tuples or attributes • The cardinality C(R) of a relation R is the number of tuples it contains = # of rows in its table • Relational model represents data as a collection of unordered two-dimensional tables

  11. Keys • Key: an attribute (or minimum set of attributes) that uniquely defines a tuple • In the example relation, “Course” is a key • A relation may have more than one key. • A set of attributes that can serve as a key is a candidate key. • One is chosen as the primary key. • Keys are used to reference (retrieve) tuples.

  12. Foreign Keys • A key from one relation that is an attribute of another relation is a foreign key. • If we had a “Faculty” relation, then “Faculty” would be a foreign key in the “Courses” relation. • Foreign keys connect relations together. FOREIGN KEY: PRIMARY KEY PRIMARY KEY

  13. Course, Room(Courses) Courses Operations on Relations • Projection List specific attributes L (columns) of R, written L(R) E.g. show course number and room

  14. Operations on Relations • Selection (extract horizontal slices) • List all tuples of relation R whose attributes satisfy condition C, written C(R) • E.g. show all tuples with Room = 152, Room=152(R) • Projection & Selection are unary (1-table) operations

  15. Structured Query Language (SQL) • A data manipulation language for manipulating relational databases • SELECT queries the database • UPDATE modifies relations • DELETE removes tuples Syntax of the SQL SELECT command: SELECT { attributes }FROM { table }WHERE { attribute-conditions };

  16. Structured Query Language (SQL) YIELDS DISTINCT TUPLES SINCE CourseNo IS A KEY • Projection • SQL: SELECT CourseNo, Room FROM Courses; • SQL: SELECT DISTINCT Room FROM Courses; • Selection • SELECT * FROM Courses WHERE Room= “152”; • Give a table of all courses that meet in 152 MUST ASK FOR DISTINCT TUPLES SINCE Room IS NOT A KEY

  17. Join • The natural joinA * B consists of tuples with matching attributes (names & values) in A and B • Natural join is a way of obtaining information across tables

  18. Natural Join A * B • Attribute names and values must match • Also called “inner join”: Statistics * Geography • Cartesian product and join are binary operations Statistics: Geography: * =

  19. Joins in SQL • SELECT City, State, RadioFROM StatisticsINNER JOIN GeographyON Statistics.City = Geography.City Statistics: Geography: Result of Query:

  20. Other SQL Constructs • ORDERBY (sorting) • SELECT Company, OrderNumber FROM Orders ORDER BY Company; • BETWEEN • SELECT * FROM Persons WHERE LastName BETWEEN 'Hansen' AND 'Pettersen‘; • ALTER TABLE (changes table structure) • Functions • SUM() • COUNT() • MAX()

  21. Database Constraints • Domain (data validity) constraints • All values in a column must be from the same domain • Example: all salaries are positive numeric dollar amounts. “Monthly” is invalid. • Entity Integrity • Every entity must have a unique primary key. (Otherwise, can’t access the entity) • Referential Integrity • Every foreign key value in a relation must match a primary key in the foreign relation table

  22. Functional Dependency • Attribute B is functionally dependent on attribute A if the value of A uniquely determines B • One-to-one relationship: two functional dependencies: A depends on B; B depends on A • Many-to-one relationship: one functional dependency: B depends on A • Many-to-many relationship: no dependencies: neither A nor B depends on the other • Functional dependencies are constraints between attributes or sets of attributes. They must be maintained or error or inconsistency will result.

  23. Normalization • A relation is well-structured if it is non-redundant and allows INSERT, MODIFY and DELETE without error or inconsistency. • Normalization assists in maintaining functional dependencies and preventing errors and inconsistencies. • DELETE anomaly: • Deleting “Jones” removes all information about course 46-870 (namely that its room is 150) • In the information is in another table, it shouldn’t be here also.

  24. Normalization • MODIFY anomaly: • Suppose Smith’s email address changes. Every line in the table corresponding to Smith must be changed or data will be inconsistent. • An attribute unique to a key should be entered only once in the database.

  25. Normalization • Restructuring to produce smaller, well-structured equivalent relations, reduce data replication • First Normal Form. Make all attributes atomic. No multiple values. MULTIPLE VALUES MULTIPLE VALUES FIRST NORMAL FORM:

  26. Second Normal Form • Eliminate partial functional dependencies. Every non-key attribute must depend on all key attributes (or redundancy can result). Capital DEPENDS ON State ONLY, NOT CITY KEY IS(City, State) NOT IN 2NF: 2NF: DECOMPOSE INTO TWO TABLES There are many other normal forms and normalization rules

  27. Entity-Relationship (ER) Diagrams • Must specify: • Entities (things to be represented in the database) • Attributes (properties of entities) • Relationships (relations among entities) • These can be modeled by entity-relationship diagrams • The diagrams are used as a guide to designing the database

  28. Entity-Relationship (ER) Diagrams • Entity types • Entity type: Store • Entities: Downtown Store, Squirrel Hill Store, Oakland Store • Relationships between entity types: • This is the “Has” relationship • Direction of arrow is important (“Branch has Staff,” not “Staff Has Branch”) EXAMPLE FROM CONNOLLY & BEGG

  29. Entity-Relationship (ER) Diagrams • Relationships need not be binary: • This is the “Arranges” relationship; it can be though of as a 4-tuple (Solicitor, Bid, Buyer, Institution) EXAMPLE FROM CONNOLLY & BEGG

  30. Entity-Relationship (ER) Diagrams EXAMPLE FROM CONNOLLY & BEGG

  31. Entity-Relationship (ER) Diagrams EXAMPLE FROM CONNOLLY & BEGG

  32. Web Database Connectivity JDBC = Java Database Connectivity SQLJ = Java-Embedded SQL SOURCE: CONNOLLY & BEGG

  33. Distributed Databases • Databases in which data is stored in more than one location but appears local to the user • Replicated: multiple copies of database • Partitioned: data is split among locations • Fragmentation • Information about fragments is stored in a distributed data catalog (DDC) • Horizontal v. vertical fragmentation

  34. Distributed Databases • Advantages • Reduced load on central DB • Lower cost (data spread among small machines) • Reliability (machine failure is not fatal) • Fast access to local data • Ease of growth • Disadvantages • Complexity. Difficult to maintain consistency • Security (many access points) • Telecommunications required

  35. Distributed Databases • Products • PeerDirect • Issues • Updating of information in a distributed database is a form of transaction processing

  36. What is a Transaction? • An action requiring a series of steps and database updates. • Transactions are the basis of Ecommerce. • Transactions may be distributed. Steps processed on different computers. • Transactions may fail. One or more steps may be unsuccessful. • Transaction systems must be recoverable. Data and “state” must be restored after failure.

  37. ATM Transaction Database ACCOUNT MASTER STOLEN CARDS RECENT ACTIVITY POSTING LOG

  38. ATM Withdrawal 1 Check STOLEN If card is stolen, ABORT 2 Check PIN If wrong, retry 3 times, ABORT 3 Check RECENT v. BALANCE Too much activity, ABORT 4 Check ATM reserve If not enough money, ABORT 5 Update RECENT Indicate new activity 6 Update BALANCE Debit bank account 7 Write to Log Record transaction 8 Update ATM reserve Debit ATM balance 9 Tell ATM to dispense money Pay the man 10 Check dispensing status If failed, ABORT 11 Make updates permanent COMMIT the transaction (Think: “to memory”)

  39. ACID • Four minimum requirements of a transaction T in a transaction system: • Atomic. T executes completely or not at all. • Consistent. T preserves database consistency and integrity. • Isolated. T executes as if it were running alone. Not affected by other concurrent transactions. • Durable. T’s results preserved during failure.

  40. Atomicity • Transaction: John pays Mary $100. • Take $100 out of John’s account. • Add $100 to Mary’s account • Problems: • John or Mary might not have an account • John might not have $100 • System might fail after subtracting $100 from John • If failure occurs, must undo partial results • “Commit”: successful recording of a transaction • “Abort”: failure of a transaction.

  41. Consistency • Maintain database constraints • data validity • unique primary keys • referential integrity • conservation conditions (debits = credits, total cash = sub of cash in all accounts, etc.)

  42. Isolation • Two transactions T1, T2 are interleaved if some steps of one are performed after the other starts but before it completes.T1 has steps A B C D E F; T2 has steps P Q R SA BPC DQ R SE F is an interleaved schedule.P Q R SA B C D E F is not interleaved. • A sequence of transactions is isolated if their steps can be interleaved without affecting the result. Transactions are blind to simultaneous execution.

  43. Durability • Results must survive failure. • Logging. Maintaining a record of all data updates so databases can be repaired if failure occurs. • Updates must be logged before they are performed. If failure occurs, transaction can complete from failure point. • If abort is necessary, can undo logged transactions. • Without logging, can’t recover from some types of failures.

  44. Simultaneous Transactions • If all TP were done by one single-threaded process, it would be easy. Just execute one step at a time. • With just two threads (or processes) it’s complicated. 2 transactions T1, T2: READ A; A = A+1; WRITE A; • Value of A is 6, but it should be 7!

  45. Locking • A solution is LOCKING. Associate a variable with one process at a time. LOCK the others out. • LOCK A; READ A; A = A+1; WRITE A; UNLOCK A; • If T1 starts first, it locks A. • When T2 tries to lock A, it can’t. It has to wait. • T1 finishes completely before T2 can lock A. • After T1 finishes, A = 6 • After T2 finishes, A = 7, the correct value • Locking achieves atomicity

  46. Deadlock T1: LOCK A; LOCK B; B=A+B; UNLOCK A; UNLOCK B; T2: LOCK B; LOCK A; B=A-B; UNLOCK B; UNLOCK A; This is deadlock. Neither transaction can complete.

  47. Ways to Eliminate Deadlock 1. Require each transaction to request all locks at the same time. System either grants them all or none. • Problem: very restrictive. Transactions cannot be interleaved. Essentially serial execution. 2. Assign an ordering to the variables: A=1;B=2;C=3 …Require transactions to request locks in that order. 3. Do nothing. Periodically check for deadlock. If it exists, cancel out a transaction.

  48. Q A &

More Related