560 likes | 802 Vues
Introduction to Database Design. Database Design ? - Decide database structure (schema) to store data value Design Issues - What are the user requirements - How can the requirements be expressed - How can they be translated into an effective database structure Considerations
E N D
Introduction to Database Design • Database Design ? • - Decide database structure (schema) to store data value • Design Issues • - What are the user requirements • - How can the requirements be expressed • - How can they be translated into an effective database • structure • Considerations • - Integrity : insert, delete, update • - Consistency • - Security • - Efficiency (response time, storage) • - DB extension
Introduction to Database Design (cont'd) • Database Design Process • Requiement Collection and Analysis • - Analysis of user requiements and DB usage • Conceptual Design • - Conceptual schema modeling (E-R Diagram) • Logical Design • - Transform conceptual schema into relational schema • Physical Design • - Decide storage structure and access path • for performance • Implementation • - Describe DB Language and load DB, convert DB • DB life cycle • 요구분석 • 설계 • 구현 • 운영 • 감시및개선
Requiement Collection and Analysis • Establishment of Organizational Objectives • Derivation of Specific Database Reqirements • Documentation of the Requirements • Collecting Information about Data Usage • - Interview department managers • - Identify its data • - Identify rules about each application • - Define data usage plan • Review of Requiement Specification • Requiement Specification Techniques • SADT, DFD, HIPO etc.
Conceptual Design • - High-level human-oriented representation for the database • requirements • - Input : Requirement specification • Output : high-level description such as E-R Diagram • Primitives in the Real World • Object • - something which can be presented by nouns • - e.g., a employee, a order, a student • Object Class • - a group of objects • - e.g., EMPLOYEES, ORDERS, STUDENT
Conceptual Design(cont'd) • Properties • - characteristics of objects • - e.g., student-name, year, dept, sno • Association • - connection of objects • - e.g., student enroll the 'Database' course • * These real world primitives should represent by • conceptual primitives
Conceptual Design(cont'd) • Real World and Conceptual Promitives • Real World Primitives Conceptual Promitives • Object Entity • Object Class Entity Set • Property Attribute • Property Value Set Domain • Fact Value • Association Relationship (set)
Baseball Database Example • Database that records baseball players, the teams • they played for their batting average and position played • Entity Sets • PLAYERS, with attributes • - Name : key • - Home /* birth place */ • - Bdate /* birth date */ • POSITIONS, with attributes • - PosName : key /* pitcher */ • - PosNumber /* 1 for pitcher */ • TEAMS, with attributes • - City • - Year - Franchise /* Giants, Tigers */ : key
Baseball Database Example(cont'd) • Relationship Sets • SEASONS, between PLAYERS and TEAMS • - Mapping : many-to-many • - Attribute : BA /* batting average for that season */ • PLAYERS, between PLAYERS and POSITIONS • - Mapping : many-to-many
Home Bdate Name Franchise City Year PLAYERS TEAMS PLAYS SEASONS POSITIONS BA PosName PosNumber Baseball Database Example(cont'd) • The E-R Diagram for the baseball database
Conceptual Design(cont'd) 세세하고복잡한것을 단순하면서도전체적인 객체로표현 • Basic Principle of Conceptual Schema Modeling : ABSTRACTION • Strategies to Conceptual Schema Design • Top-down Design • - Srart with a schema containing high-level abstractions, and • then apply successive top-down refinements. Course # Sec # Semester Instructor COURSE_OFFERING successive top-down refinment Name Course # Sec # Semester INSTRUCTOR COURSE_OFFERING OFFERED_BY
FACULTY FACULTY • combine or add ADVISE LECTURE STUDENT STUDENT Conceptual Design(cont'd) • Bottom-up • - Start with a schema containing basic abstractions, • and then combine or add to these abstractions
Conceptual Design(cont'd) • Characteristics of Conceptual Design • Expressiveness : distinguish different data types, • relationship set and integrity • Simplicity : understand • Minimization : non-overlapping • Diagrammatic Representation • Formality : accurate, unambiguous
Logical Design • - Convert E-R diagram into a system dependent model • - Input : E-R Diagram • Output : Relational (target) Database Schema • Basic Transformation Procedure • (1) Entity Set • - one table per one entity set • - each attributes in entity set converted into attributes • of table • - The table for the weak entity set has the attributes in • the key of the associate strong entity set
Logical Design (E-R diagram transformation) • (2) Relationship Set • - suppose the relationship involve entity sets E1, ... , En • - Let key(Ei) denote the key of Ei • The relationship has no attributes : • the table with attributes UiKey(Ei) • The relationship has attributes a1, ..., ak : • the table with UiKey(Ei) U { a1, ..., ak }
TEAMS Franchise City Year PLAYERS Name Home Bdate PLAYS Name PosName POSITIONS PosName PosNumber SEASON Name Franchise Year BA Baseball DB Transformation Example
Logical Design (E-R diagram transformation) • Possible Refinement of the Basic Transformation Procedure • - The Transformation procedure may produce too many • relations due to making a relation for each relationship • By adding the primary key of T to relation S, it may not • be necessary to make a relation for R primary key S R T foreign key
Physical Design • - Decide storage structure and access path • - Input : Relational Database Schema • Output : Record format, Access Path, Clustering • Design Results • (1) Physical Record Format Design • - Determine physical record format • e.g., fixed length, or variable length record • (2) File Structure Selection • - Select appropriate access methods • e.g., sequential, indexed sequential, random
Physical Design(cont'd) • (3) Index Selection • - Which attributes are used as indies • e.g., Emp# (primary), DeptName (second) • - Retrieval time & Storage cost • (4) Reorganization • - New applications, Dynamic database • - Performance degradation with time
R1 R2 R3 R4 R1 R2 R3 R4 Record Storage and File Organization • Records and Fields • type EMP = record • Name : char(20); • E# : char(10); • Salary : integer; • end • A file is a sequence of records • Record blocking • - block : a unit of data transfer, usually multiple records • Spanned and non-spanned records Non-spanned spanned
Record Storage and File Organization(cont'd) • Allocating file blocks on disks • - continuous allocation : • File blocks are allocated to consecutive disk blocks • - linked allocation : • disk blocks are connected by pointers • - segment (or extent) : • clusters of consecutive blocks, the clusters are • linked together
Physical Record Format Design • Fixed Length Record • - every record in the file has the same size • - easy implementation • no compaction • example of deleting records
Sno Sname varchar(20) Sno Sname College Degree MA, MS Physical Record Format Design(cont'd) • Variable Length Record • (1) variable length field : • field of varying size • (2) repeating field : • field taht may have • multiple values • (2) mixed field : • records of different record types • e.g., type DEPT = record • DeptName : char(20) • Budget : integer • DeptEmp : array of [1. .50] of • record • Name : char(12) • Salary : integer • end • end
Access Methods • Access Methods • - storage structure and access mechanism • (1) Sequential • (2) Indexed sequential • (3) Hashing • (4) Inverted file • (5) Multi-list • Performance Parameters • - Retrieval, Insertion, Deletion time • - Space overhead
Design Considerations • Response time : The elapsed time between submitting • a database transaction for execution and receiving a reponse. • Response time also influenced by factors not under DBMS • control, such as system load, operating system scheduling, • or communication delays. • Space utilization : Storage space used by database files and • their access path structures. • Transaction throughput : Average number of transactions that • can be processed per minute by database systems. • This is critical parameter of transaction systems such as • airline reservations or banking. Throughput must be • measured under peak conditions on the system.
Relational Database Design (Terminologies) • Relational Database • - Collection of Tuples (relations) • - Row (Tuple, Record) : Entity or Relationship • - Column : Values of Attribute • Relation vs. Relation Scheme • - Relation Scheme : Definition of relation • CUSTOMER = (Cno, Cname, Caddr) • - Relation :particular instance
Relational Database Design (Terminologies) • Key • - Superkey : set of attributes which uniquely identify an entity • - Candidate Key : minimal superkey • - Primary key : a designated candidate key • Attribute • - Prime attribute : attribute which is a part of a key • - Nonprime (or nonkey) attribute : • attribute which is not a part of a key
Problems in Relational DB Design • Good Design ? • -Exactly representation of a real world • Database Design ? • - Determine logical structure of data which represent • real world. • Logical Structure • - Set of relation scheme, i.e., what relation is needed? • And what attributes are consisted of that relation? • We must transform conceptual schema into relation schema • by a rule (principle)
Problems in Relational DB Design(cont'd) • We have some anomaly if we inculde all the attributes • in an ONE relation. There are dependecy problems • between attributes. • Careless Composition • - Deletion anomaly : 200번의C123만취소했으나, 학년=3 정보도잃는다 • if we delete 'Sno = 200' tuple, we may loss some Sno's • information such as 'Year = 3'. ....... trigged deletion • - Insertion anomaly : 600번이학년=2 데이타를키값이없으면못넣는다 • if we insert 'Sno = 600, Year = 2' only, it is not inserted. • i.e., entity integrity is violated because of null value • in the key. • Virtual value of that key field is required
Problems in Relational DB Design(cont'd) • - Update anomaly : • If we update Sno = 300's Year(4) to 3, all the tuple • including Sno = 300 must be updated. • Inconsistency may be occurred in this operation. 수강Sno CourseNo Grade Year 100 C413 A 4 100 E412 A 4 200 C123 B 3 300 C413 A 4 300 C123 B 4 300 C345 C 4 500 C413 A 4 added attribute 3
생각해 봅시다 • anomaly 가 발생하는 이유는? • 여러 종류의 사실을 한 릴레이션에 모두 표현하기 때문 • 해결책: 애트리뷰트간의종속성을분석하여 • 한종속성은한릴레이션에표현한다. • 이런분해과정을정규화(normalization)라한다. 앞의수강을수강과학년으로분해 가능하다.
Problems in Relational DB Design(cont'd) • Careless Decomposition • - Loss of information • (ex) S = (Sname, Saddr, Item#, Iname, Price) • S1 = (Sname, Saddr, Iname) • S2 = (Item#, Iname, Price) t1 i1 p1 t2 i2 p2 t3 i1 p3 s1 a1 t1 i1 p1 s1 a1 t2 i2 p2 s2 a2 t3 i1 p3 s1 a1 i1 s1 a1 i2 s2 a2 i1 S1 S2 S S != S1 S2 S1과S2를조인하면S에없는투플이생성. 비록정보가더생겼지만오류가생겼기때문에 lossy-join decomposition이라한다. 반대개념은lossless-join decomposition.
Schema Transformation Rules • We keep to the following rules • Prevention of Loss of Information • Avoidance of Repetition of Information • Decomposition
Functional Dependency • Definition • Let R = (A1, A2, ... , An) be a relation scheme, and let X • and Y be subset of {A1, A2, ... , An }. We say X Y, • reads "X functionally determines Y" or "Y is functionally • determined by X", iff each X-value in R has associated • with it precisely one Y-value in R. • When X Y, X is called a determinant(결정자)and • Y is called a determinee(종속자). • FD is a property of the relation scheme, • and not of a particular relation instance • FD is in fact a constraint a relation should satisfy Y X FD가필요한이유? Key에꼭필요한애트리뷰트만 존재하게하기위해서이다.
Functional Dependency (Example) • Eamxple • STUDENT Sno Sname Dept Year • STUDENT.Sno -> STUDENT.Sname • STUDENT.Sno -> STUDENT.Year • STUDENT.Sno -> STUDENT.Dept • STUDENT.Sno -> STUDENT.<Sname, Year, Dept> • If X is candidate key of relation R, all the attribute Y of • relation R must be functionally determined by X.
Functional Dependency (Example) • Functional Dependency Diagram • ENROL Sno Course # Grade Year Year Sno Course # Grade
Functional Dependency (cont'd) • Full (Functional) Dependency • (ex) <Sno, Course #> Grade • Partial (Functional) Dependency • Y is partially dependent on X if X Y and there exists Z X • such that Z Y • (ex) (Branch_name, Branch_addess) Zip-code • Transitive (Functional) Dependency : • A is transitive dependent on X if X Y and Y A • (ex) Branch_name Branch_address, • Branch_address Zip-code 모든attribute를full로고려해야FD있다. 모든attribute를full로고려해도FD가있고 부분적인attribute만고려해도FD가있다.
Normalization • Normal Form • - Basic idea : • Decomposed relation which has independent relationship • - A relation said to be in a particular normal form if it • satisfies a certain set of constraints (1NF, 2NF, 3NF ....). • Objectives • - No redundancy • - No Insertion / Deletion Anomaly • - No Loss of Information • Issues • - What attribute in each relation • - How do we decide that 한테이블에직접관계가없는애트리뷰트는 될수있는대로그테이블에넣지마시오.
Normal Forms Universe of relations (normalized or unnomalized) 1 NF relations 2 NF relations 3 NF relations BCNF NF relations 4 NF relations 5 NF (PJ / NF) relations
First Normal Form • DefinitionFirst Normal Form (1NF) • Domains of all the relations have atomic values • 1NF does not satisfy database design constraints • Problms (example) primary key + Relational Schema LECTURE_ADVICE Sno Advisor Dept Cno Grade Functional Dependency Diagram Sno Cno Advisor Dept Grade
First Normal Form(cont'd) (1) insertion anomaly : 'Sno = 500 and Advisor = Kim' can not be inserted because primary key may not allow null values. (2) deletion anomaly : if we delete 'Sno = 200 and C100', we lose the information of his advisor 'Lee' (3) update anomaly : if we change advisor of 'Sno = 400', we may have inconsistency of data. 수강지도 Sno Advisor Dept Cno Grade Relational Schema 100 P1 CE C400 A 100 P1 CE E300 A 200 Lee EE C100 B 400 P1 CE C400 A 400 P1 CE C500 C 400 P1 CE C600 B 500 Kim ??? deletion insertion
First Normal Form(cont'd) • Solution : delete partial dependency • Partial dependency means that primary key have • irrelevant attributes. • This relation have different information in a relation. Sno Cno Advisor Dept Grade Modified Functional Dependency Diagram Sno Cno 지도 relation Sno Advisor Dept Grade 수강 relation
First Normal Form(cont'd) Modified Relation 지도 Sno Advisor Dept 100 P1 CE 200 Lee EE 400 P1 CE 수강 Sno Cno Grade 100 C400 A 100 E300 A 200 C100 B 400 C400 A 400 C500 C 400 C600 B
Second Normal Form • DefinitionSecond Normal Form (2NF) • Every non-primary attribute is fully functional dependent on a key • 2NF is not a solution • Problms (example) • Sno -> Dept • Sno -> Advisor, Advisor -> Dept • (transitive dependency) • cause anomaly Sno Advisor Dept LECTURE relation
Second Normal Form(cont'd) (1) insertion anomaly : ‘<P4, CE>’ can not be inserted because primary key may not allow null values. (2) deletion anomaly : if we delete ‘<100, P1>’, we lose the information of his dept (CE). (3) update anomaly : if we change P1’dept to EE, we may have inconsistency of data.
Second Normal Form(cont'd) • Solution : delete transitive dependency • This relation have different information in a relation. Sno Advisor Dept 수강 relation Advisor Dept 지도교수학과 relation Sno Advisor 학생지도 relation
Second Normal Form(cont'd) 학생지도 relation Modified Relation Sno Advisor 100 P1 200 Lee 400 P1 3NF 지도교수학과 relation Advisor Dept P1 CE Lee EE
Third Normal Form(cont'd) • DefinitionThird Normal Form (3NF) • 함수적 종속성X A 에 대하여 • (a) X 가 관계형 스키마 R의 수퍼(기본)키 이거나 • (b) A가 R의 primary attribute이다. • 또는 • X Z, Z Y 의 이행적 함수 종속성을 가지지 않는다. • (ex) BRANCH = (Bname, Assert, Bcity) • BORROW = (Bname, Loan #, Cname, amount) • BRANCH and BORROW are in 3NF. • (ex) R = (City, Street, Zip-code) • key • R scheme is in 3NF because city and street is prime.
BCNF (Boyce/Code Normal Form) • DefinitionBCNF • Every determinant is candidate key • Whenever X A, X is a candidate key (ex) The relation LECTURE is 3NF but it is not BCNF * constraints 1. student must take only one course to one lecturer. 2. Each lecturer have only one course. 3. One course can be taken several lecturer. 수강과목 Sno Course Lecturer 100 PL P1 100 DS P2 200 PL P1 200 DS P3 300 DS P3 300 PL P4
BCNF (cont'd) Sno Course Lecturer P.297 그림15.10 1NF : Yes - atomic values 2NF : Yes - no partial dependency 3NF : Yes - no transitive dependency BCNF : No - determinant Lecturer is not candidate key.