410 likes | 627 Vues
Overzicht Informatica College 9 – November 1. Computer Science an overview EDITION 7. J. Glenn Brookshear. C H A P T E R 8 (now chap. 9, 2 nd part). Abstractions of the actual data organization on mass storage
E N D
Overzicht Informatica College 9 – November 1 Computer Science an overview EDITION7 J. Glenn Brookshear
C H A P T E R8 (now chap. 9, 2nd part) • Abstractions of the actual data organization on mass storage • Again: differences between conceptual and actual data organization File Structures
directory tree files 8.1: Files, Directories & the Operating System • OS storage structure: • conceptual hierarchy of directories and files
8.1: Files: Conceptual vs. Actual View • View at OS-level is conceptual • actual storage may differ significantly!
8.2: Sequential Files • To ‘remember’ where data resides on disk, the OS maintains a list of sectors for each file • Result: sequential view of scattered set of data
8.2: Text Files • Sequential file consisting of long string of encoded characters (e.g. ASCII-code) • But: character-string still interpreted by word processor! File in “Notepad” Same file in “MS Word”
8.2: From actual storage to conceptual view conceptual view Interpretation by Application Program Sequential buffer sequential view Assembly by Operating System actual storage
8.2: Data Conversion • When programming: note that data transfer to/from file may involve data conversion: • e.g., from two’s complement notation to ASCII: • So: again it’s about the interpretation of data
loaded into main memory when opened • Indexing: Indexed File Index keys 8.3: Quick File Access • Disadvantage of sequential files: • no quick access to particular file data • Two techniques to overcome this problem: • (1) Indexing or (2) Hashing
Opdracht: Chapter 8 - Problem 10Why is a ‘patient identification number’ a better choice for a key field than the last name of each patient? • If key unique: • additional sequential search never required • Patient’s last name is not always unique
8.3: Inverted Files • Variation to (single) indexing: inverted file
How? • define set of ‘buckets’ & ‘hashfunction’ that converts keys to bucket numbers key value hash function bucket number … 0 1 2 3 … N 8.4: Hashing • Disadvantage of indexing is… the index • requires extra space + includes 1 extra indirection • Solution: ‘hashing’ • finds position in file using a key value (as in indexing)… • … simply by identifying location directly from the key
Key values 8.4: Hash Function: Example • If storage space divided into 40 buckets and hash function is division: • key values 14, 54, & 94 all map onto same bucket (collision)
not fixed in size! 8.4: Handling Bucket Overflow • When bucket-sizes are fixed: • buckets can fill up and overflow • One solution: • designate special overflow storage area
101 Division: 101 / 23 = 4, remainder 9 bucket number: 9 … … 0 1 2 … 9 … 23 Opdracht: Chapter 8 - Problem 22If we use division as a hash function and have 23 buckets, in which bucket should we search to find the record whose key is interpreted as the integer value 101?
Opdracht: Chapter 8 - Problem 16a) What advantage does an indexed file have over a hash file?b) What advantage does a hash file have over an indexed file? • a) When key unique: index directly points to required data, while hashing oftens require an additional (sequential) bucket search (incl. bucket overflow). • b) No additional index file storage is required.
Chapter 8 - File Structures: Conclusions • File Structures: • abstractions of actual data organization on mass storage • Changes of ‘view’: • actual storage -> sequential view by OS -> conceptual view presented to user • Quick access to particular file data by • (1) indexing (many forms) • (2) hashing (requires no index, but requires bucket search!)
C H A P T E R 9 • (Large) integrated collections of data that can be accessed quickly • Combination of data structures (chap. 7) and file structures (chap. 8) Database Structures
9.1: Historical Perspective • Originally: departments of large organizations stored all data separately in flat files • Problems: redundancy & inconsistencies
9.1: Integrated Database System • Better approach: integrate all data in a single system, to be accessed by all departments
9.1: Disadvantages of Data Integration • Disadvantages: • Control of access to sensitive data?! • Bijvoorbeeld: personeelszaken heeft niets te maken met persoonlijke gegevens opgeslagen door de bedrijfsarts! • Misinterpretation of integrated data • Supermarkt-database zegt dat een klant veel medicijnen koopt. Wat betekent dit? Wat als deze klant solliciteert op een baan bij de supermarkt-keten? • What about the right to hold/collect/interpret data? • Heeft een credit card company het recht gegevens over koopgedrag van personen te gebruiken/verkopen?
Compare: Operating System Actual data storage Data seen in terms of a sequential view 9.2: Conceptual Database Layers
9.3: The Relational Model • Relational Model • shows data as being stored in rectangular tables, called relations, e.g.: • row in a relation is called ‘tuple’ • column in a relation is called ‘attribute’
9.3: Issues of Relational Design • So, relations make up a relational database… • … but this is not so straightforward: • Problem: more than one concept combined in single relation
9.3: Redesign by extraction of 3 concepts Any information obtained by combining information from multiple relations
9.3: Example: • Finding all departments in which employee 23Y34 has worked:
9.3: Relational Operations • Extracting information from a relational database by way of relational operations • Most important ones: • (1) extract tuples (rows) : SELECT • (2) extract attributes (columns) : PROJECT • (3) combine relations : JOIN • Such operations on relations produce other relations • so: they can be used in combination, to create complex database requests (or ‘queries’)
RESULT X.U X.V X.W Y.R Y.S A Z5 3 J A Z5 4 K C Q 5 3 J C Q 5 4 K Opdracht: Chapter 9 - Problem 10 X relation U V W A Z5 B D 3 C Q 5 Y relation R S 3 J 4 K • RESULT := PROJECT W from X SELECT from X where W=5 PROJECT S from Y JOIN X and Y where X.W > Y.R
Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • a) Which companies make Bolt 2Z? • NEW := SELECT from MANUFACTURER where PartName = Bolt2Z • RESULT := PROJECT CompanyName from NEW
Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • b) Obtain a list of the parts (+cost) made by Company X? • NEW := SELECT from MANU’ER where CompanyName=CompanyX • RESULT := PROJECT PartName, Cost from NEW
Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • c) Which companies make a part with weight 1? • NEW1 := JOIN MANUCTURER and PART where MANUFACTURER.PartName = PART.PartName • NEW2 := SELECT from NEW1 where PART.Weight = 1 • RESULT := PROJECT MANU’ER.CompanyName from NEW2
Opdracht: Chapter 9 - Problem 11 PART relation MANUFACTURER relation CompanyName PartNameCost Company X Bolt 2Z .03 Company X Nut V5 .01 Company Y Bolt 2X .02 Company Y Nut V5 .01 Company Y Bolt 2Z .04 Company Z Nut V5 .01 PartName Weight Bolt 2X 1 Bolt 2Z 1.5 Nut V5 0.5 • c) Which companies make a part with weight 1? • NEW1 := SELECT from PART where Weight = 1 • NEW2 := JOIN MANUCTURER and NEW1 where MANUFACTURER.PartName = NEW1.PartName • RESULT := PROJECT MANU’ER.CompanyName from NEW2
Chapter 9 - Database Structures: Conclusions • Database Structures: • (large) integrated collections of data that can be accessed quickly • Database Management System • provides high-level view of actual data storage (database model) • Relational Model most often used • relational operations: SELECT, PROJECT, JOIN, … • high-level language for database access: SQL
Overzicht Informatica – Tentamen (1) • Most important sections (editie 7) & keywords: • Ch. 0 - 1, 3, 4:abstractie / algoritme • Ch. 1 - 1, 2, 3, 4, 5, 6, 7: bits / data opslag & representatie (ASCII, etc) / Boolse operaties / flipflops / geheugen-vormen en -karakteristieken / getalstelsels (binair, hexadecimaal, etc…) / overflow & truncation errors • Ch. 2 - 1, 2, 3, 4, 6: cpu architectuur / machine language & instructions / programma executie / machine cycle / alternatieve architecturen • Ch. 3 - 1, 2, 3, 4: operating systems / batch processing / time-sharing / multitasking / OS componenten / process vs. programma / competition • Ch. 4 - 1, 2, 3, 4, 5, 6:algoritme (formeel) / primitiven / pseudo-code / syntax / semantiek / iteratie / loop control / recursie / efficientie
Overzicht Informatica – Tentamen (2) • Most important sections (editie 7) & keywords: • Ch. 5 - 1, 2, 3, 4, 5: generaties: 1e, 2e, 3e / assembly language / compilers / machine independence / paradigma’s / imperatief / object-georienteerd / programming concepts / procedures / parameters / call by value/reference • Ch. 6 - 1, 2, 3: software life cycle / ontwikkelings-fase / modulariteit / koppeling / cohesie / documentatie / complexiteits-maat voor software • Ch. 7 - 1, (2-5): datastructuren / abstractie / statisch vs. dynamisch / pointers / (arrays, lists, stacks, queues, etc…) • Ch. 8 - 1, 2, 3, 4: files / sequential / tekst / indexed / hashing • Ch. 9 - 1, 2, 3: databases vs. ‘platte’ files / relaties / tuples / attributen / relationele operaties: SELECT, PROJECT, JOIN
Overzicht Informatica – Tentamen (3) • Geen tentamenstof: • Ch. 3.5 - 3.7 (editie 7) : Networks • Ch. 4 (editie 8) : Networking and the Internet • Ch. 10 (editie 7 & 8) : Artificial Intelligence • Ch. 11 (editie 7 & 8) : Theory of Computation Veel succes!