Chapter 9: Database Structures

Chapter 9: Database Structures • Databases have become the predominant source for data storage in the world • This impact of the computer can be felt everywhere and similarly, the need for databases has required that the business world embrace the computer • In this chapter, we will examine • what a database is • the relational database (the most common model) • maintaining database integrity • social impacts of the database

Flat File vs. Database • In chapter 8, we examined files • Information is stored in a flat way, that is, all information is stored from a single point of view -- accessing sequentially or through an index • We cannot use a flat file to withdraw certain, linked information, at least without a lot of additional processing • A Database represents a multidimensional file with its own internal links between related pieces of data • A database is the stored information • Here, we want to study not just the database, but the database management system that allows us to create, maintain and access a database

Data Integration • The idea behind a database is that of data integration -- of having the data in one format and being able to view it in different ways: • Consider an integrated database of a store’s information • including employees, customers, purchasing, sales, etc • We can use the single database to generate reports for payroll, accounts receivable and payable, invoicing, inventory, etc • We can control who gets access to which part of data (why should the personnel department view files about customers, or an employer view files on inventory?) • Updating one piece of information can be reflected across reports generated based on the dependencies of that information

Schema and Subschema • A schema is a description of the entire database structure • A subschema is a description of only that portion of the database pertinent to a particular user’s need • Example: UTPA database contains faculty records, student records, class information, etc • A subschema might be of student grades while another might be faculty work load and salary • By allowing users to describe their own subschema • it provides efficient access to the portion(s) relevant to that user • this has not always been the case with databases • it provides a mechanism for controlling security

Conceptual Layers of a DB • As stated earlier, the DB is the stored records whereas the DBMS is software for accessing the DB • We can extend this view to include another layer, application software that requests information from the DBMS -- this allows us to • separate details of the DB from the application that might use the DB, those details are instead found in the DBMS • change the structure of the DB or even implement a distributed DB without having to alter the application software • separating the DB from the software that uses the DB is known as data independence and is the ultimate form of a DB • consider the WWW where we want to access information without having to know where or how that information is stored • See figure 9.2 p. 401

The Relational Model • This is the most common form of a DB • The idea is that all information is stored in relations, which are tables that are composed of • individual records (the rows) • the book refers to these as tuples • and fields (the columns) • the book refers to these as attributes • A DB has any number of relations, each representing a different set of information • Relations may (and often will) overlap in some of the information that they store • See figure 9.3 p. 404 for a simple relation

Relational Design • One difficulty in designing a DB is determining what information should go into a relation • Consider the employee information from figure 9.3, should we also add to it their jobs? • In figure 9.4 (p. 406), we have added to this relation each employee’s JobID, Job Title, Skill Code, Dept, StartDate and TermDate • However, in the case of an employee who has held more than one position with the company, we have had to duplicate some of their information by having a second or third entry (see Joe E. Baker and G. Jerry Smith)

Duplications: Is it a problem? • Aside from taking up a little extra storage space, why is the duplication a problem? • Consider that the relation was originally employee information meaning that each employee should have one record only • Now consider what would happen if we wish to delete Joe E. Baker from the Floor Manager Job because he has been terminated • We might wind up erasing both of Joe E. Baker’s entries! • In general, a DB relation contains a primary (or key) field which should have unique values • In figure 9.4, there are no primary fields

Using Several Relations • In order to remove duplication, we should reorganize the information and use several different relations • We see the new version in figure 9.5 p. 407 where there are now three relations, EMPLOYEE, JOB and ASSIGNMENT • This is conceptually better • How do we delete Joe E. Baker’s Floor manager position now? • First, we have to find Joe E. Baker from the EMPLOYEE relation and obtain his EmpID, next we map into the ASSIGNMENT relation to find his JobID, and then we go to the JOB relation and remove the proper entry • Its more work, but its worth it

How far do we take it? • In subdividing a relation into smaller relations, we could go too far • In figure 9.6 p. 408 we have three attributes • EmpID, JobTitle and Dept • Consider that a given employee might have held two different job titles in the same department or has the same job title (say, manager) could be present in two different departments • Should we divide our relation into two relations, one with EmpID and JobTitle, and one with JobTitle and Dept? • If we do (see fig 9.7 p. 408), then we have a problem • how do we find what department a given employee works in? We can find his/her job title, but that does not mean that he/she works in a department that has that job title! • So the design of relations must be thought out carefully! • Decomposing a relation into subrelations is nonloss decomposition if the decomposition does not result in a loss of information

Now that we have introduced the concept of a relation, how do we access it? There are three basic operations for access: SELECT -- obtain certain records that match a given condition PROJECT -- obtain certain fields interestingly, the keyword that we will use to perform a SELECT is where and the keyword that we will use to perform a PROJECT is select JOIN -- combine two or more relations together NEW <-- SELECT from EMPLOYEE where EmplId = “34Y70” Creates a new relation See figure 9.8 p. 410 NEW1 <-- SELECT from JOB where DEPARTMENT = “ …” Another new relation consisting of records from given department NEW2 <-- PROJECT JobTitle from NEW1 Display only job titles from new relation MAIL <-- PROJECT Name, Address from Employee see figure 9.9 p. 411 Relational Operations

SELECT and PROJECT are straightforward, but they do not allow us to combine different relations of information Imagine that you want the names and addresses of all employees from the Sales Department Unfortunately, the EMPLOYEE relation does not contain information about departments and the JOB relation does not contain mailing information We must then JOIN the two (or more) relations together A JOIN is sort of like performing a dot product between two relations it takes each entry from the first relation and concatenates all relevant entries from the second relation creating a new relation that is longer than both individually NEW3 <-- JOIN EMPLOYEE and ASSIGNMENT See figure 9.5 This will create a new relation, NEW3, which has 7 fields 1 record for each employee’s assignment JOINs

Combining JOIN and SELECT • See Relations A and B in figure 9.10 p. 412 • C <-- JOIN A and B where A.W = B.X • This JOIN creates a new relation, C, which contains all 5 fields and contains records for each matching item in A and B where the field W equals the field X • For V=‘r’, W=‘2’, we have a match with X=‘2’, Y=‘m’, Z=‘q’, so we have a record in C of “r 2 2 m q” • We similarly have two matches for V=‘t’, W=‘4’ because there are two records in B where X=‘2’ • So, C has three records • In figure 9.11 p. 413, we have another example of C <-- JOIN A and B where A.W < B.X • Interestingly, we get the same three records as the result

Combining Operations • As it turns out, we can not only combine a SELECT and a JOIN, but we can also combine several SELECTs together, or a SELECT and PROJECT, and so forth • NEW1 <-- JOIN ASSIGNMENT and JOB where ASSIGNMENT.JobID = JOB.JobID • This creates a new relation that consists of the fields EmplID, JobID, StartDate, TermDate, JobTitle, SkillCode and Dept for all each of the available JobIDs as they are stored in the two relations • See figure 9.12, p. 414

SQL • SQL is the structured query language • It is used extensively in the data processing community for manipulating DBs • One reason for its popularity is that it is standardized by ANSI, so learning it means that you can apply it to different DBs running on different computers and in different companies • Newer GUI tools manipulate the underlying DB by converting user requests into SQL • SQL queries will look similar to our previous examples but differ a little

SQL Examples • select EmplId, Dept from ASSIGNMENT, JOB where ASSIGNMENT.JobID = JOB.JodID and ASSIGNMENT.TermDate = ‘*’ • JOIN the two relations PROJECTing the fields EmplID and Dept and SELECTing the matching entries that share the same JobID and where the TermDate is ‘*’ • select Name, Address from EMPLOYEE • a simple PROJECT from a relation (mailing list) • select EmplID, Name, Address, SSNum from EMPLOYEE where Name = ‘Cheryl H. Clark’ • a PROJECT and SELECT combined to retrieve one record

More SQL Commands • Aside from accessing a DB, SQL has commands to manipulate a DB as well • insert -- used to add a record to a DB relation • insert into EMPLOYEES values (‘42Z12’, ‘Sue A. Burt’, ‘33 Fair St.’, ‘444661111’) • delete -- used to delete 1 or more records from a DB relation • delete from EMPLOYEES where Name = ‘G. Jerry Smith’ • update -- used to alter one or more fields of one or more records in a relation • update EMPLOYEE set ADDRESS = ‘1812 Napoleon Ave’ where Name = ‘Joe E. Baker’ • update JOB set JobTitle = ‘Administrative Assistant’ where JobTitle = ‘Secretary’ • update JOB set Pay = Pay + 1500 Where JobTitle = ‘President’

Other Forms of Databases • There are several other forms of DBs aside from relational • Hierarchical • Distributed • Network • Object-Oriented • These other forms are applied more often in DB research than in business because it is unclear which form is actually most efficient and useful, so instead, most people use the traditional relational approach • Differences between approaches are covered in 4333 • The book covers OODB but we will skip this

Maintaining DB Integrity • In business settings, maintaining the integrity of a DB is critical • Lost or corrupt data could cost a company $ millions • How could information get lost or corrupted? • In a distributed DB, messages could get lost between the node that performed the DB interaction and the DB itself • In the case of a multitasking or distributed DBMS, two overlapping accesses could cause a problem • recall accessing a critical section and the semaphore mechanisms needed to ensure mutual exclusion

One security mechanism is to keep a log of all transactions using non-volatile storage This log is available so that, if an error arises, the DB can manually be adjusted to fit the log’s information The commit point is the point in performing DB accesses when all accesses match the log If the commit point has been reached, then the DB is up-to-date and accurate If the commit point has not been reached and some error occurs, the log must be used to find the inaccuracies and fix them The most common way to fix the problems is to consult the log and find the last commit point, and then roll back the DB to that point In which case, the chances made since the commit point must be performed manually This recreation of accesses will then bring the DB up to the point when the error arose A particular problem may arise if transactions were performed on other transactions that occurred after a commit point -- this requires a cascading rollback Commit and Roll Back

Locking • Aside from errors, the DB requires a synchronization mechanism to ensure mutual exclusion to critical sections • in DB terminology, this is known as locking • DBs will use two kinds of locks • Shared locks -- when accesses to critical sections are read only -- then no locking is actually required and so the processes share a lock • Exclusive locks -- if a process is going to alter data, then it must first obtain the Exclusive lock, which is only available if no other process is currently accessing or using the data • Again, the topics of locking, synchronization and maintaining integrity are covered in more detail in 4333, and also to some extent in 4334

Data is power (or money) Those who have data, can use it for their own benefits Unfortunately, this leads to a number of social problems including a loss of privacy you have just bought a bag of dog food, now you are being bothered at home by the Humane Society for a donation, and Dog Magazine to buy a subscription mailing lists and phone solicitors are annoying, but they are created by sharing data in DBs And if the information is inaccurate, then you have a lot of problems (e.g., bad credit history!) Benefits of DBs Credit card shopping and quick credit Internet shopping Easy access to information library holdings stock reports over the Internet law enforcement and reporters Special offers for good customers Social Impact of DB Technology

Do the Benefits Outweigh the Problems? • Definitely an open question -- what do you think? • Can we solve some of the problems? • How do you fix your bad credit rating? What happens if you fix it in one DB, but the info has already been transmitted to other DBs? • Public opinion has had an impact • Social Security Administration decided not to put SS#’s on the Internet because of public reaction • AOL decided not to sell customer info to telemarketers based on user responses • And there are numerous federal laws pertaining to federal agencies and their use of data

Chapter 9: Database Structures

Chapter 9: Database Structures

Presentation Transcript

Chapter 1 The History of Sport Management

Chapter 2 E-Marketplaces: Structures, Mechanisms, Economics, and Impacts

Data Structures

Chemical Bonding and Molecular Structure (Chapter 9)

Chapter 20: The Knee and Related Structures

Chapter 25 Relational Databases

CHAPTER 6

Chapter 7: Relational Database Design

Chapter 1

Chapter 5

Starting Out with Java: From Control Structures through Objects Fifth Edition by Tony Gaddis

Kagan Structures

Chapter 23: XML

Chapter 3 Structures and Functions of Nucleic Acids

Chapter 10: XML

Intro to Computer Science I

Concepts of Database Management Seventh Edition

Introduction to Database Design Methodology

Introduction to Database

Chapter 8

Chapter 8