The Early Methods of Data Processing

The Early Methods of Data Processing • Data as part of a program (Input ......Data) • Data File (Open ......As...... , #n) • Sequential (Input #n, Print #n) • Read data sequentially from beginning to end. • Random I/O (Put #n, Get #n) • Can read specific part of data from the file. But it is still difficult to locate the exact part.

Early Data Models • Flat File • Hierarchy Model (One to Many) • Network Model (Many to Many) • The above two models have relations being built in the data. Difficult to maintain. • Relational

An Example of Flat File

Split Flat File

Relations between the Tables

RDBMS Products • Desktop RDBMS • dBASE, Paradox (Borland) • FoxPro, Access (Microsoft) • Mainframe (or Client/Server) RDBMS • ORACLE (Oracle) • MS SQL (Microsoft) • DB2 (IBM) • Sysbase, Informix

Traditional vs Client - Server RDBMS ( I ) • Traditional • Tables, Indices, Codes are in different files • Relations, Integrity, Security, Consistency are applied by the programs users developed • Using FIXED WIDTH to store data • Non Traditional (Modern) • All the related parts are in a file called PROJECT • Relations, Integrity, Security, Consistency are managed by RDBMS itself • Using VARIABLE WIDTH to store data

Traditional vs Client - Server RDBMS ( II ) • Client-Server (Front-End) • END databases and FRONT interfaces are independent each other • Higher performance • Better security • MS Access: A hybrid database product • VB/MS Access: A good example of client-server design tool

Database Trend • Multiple Platform - Access EXTERNAL DB • ISAM (Index Sequential Access Method) • ODBC (Open Database Connectivity) • OLE DB • Data Warehouse / Data Mart • Data Mining

Database Structure Design • A well designed database structure can • Keep the project development on the right track. • Ensure the system meets all the requirements, including Flexibility, Security, Integrity and consistency. • Save the system's resources by eliminating unnecessary duplication. • Make programming easier.

Data Normalization ( I ) • Definition: • the process of transforming existing data into relational tables. • Objectives: • To eliminate duplicated fields (attributes) • To accommodate future changes • To minimize the impact of changes to structure on the related programs

Data Normalization ( II ) • Functional Dependency Analysis: Group the related data into tables according to their dependencies. • B DEPEND ON A: from B, you can only find one A, not multiple A's • {Name, Address, Birthday} DEPEND ON Student ID • Course NOT DEPEND ON Student ID

Data Normalization ( III ) • First Normal Form: Eliminates REAPEATING fields and NON ATOMIC values. • Second Normal Form: Requires each field to depend on EVERY PART of the primary key. • Third Normal Form: Requires all non-primary fields SOLELY DEPEND on the primary key (Not depend on other fields). • Don't include the fields which can be derived from others.

The Reasons of Violating Normalization • A trade off of speed vs storing space, flexibility vs efficiency • Repeating fields: when the repeating number is known • Non-atomic fields: e.g. Address • Derived fields: when it can improve the speed a lot • E.g. (Value = price x quantity)

Name Convention • Reasons? -- Limitation from the software; Readability of the programs (both the yourself and others). • Length of the Name • File Name: Not more than 8 characters. • Field Name: Not more than 10 characters. • Characters Used in the Name • Names should begin with a LETTER. • Don't use control characters (+-*/=><,;:\) • Use the name close to the real attribute • (e.g., Name, Address, StudentID) • Use certain prefix to group the classes. • Use capital letters to separate the words.

Table Design (I-1) • Field Type • Character (Text): Store text data. Each character occupies 1 byte. Normally up to 255 digits. • Numerical: • Byte: Store 1~255, 1 byte. • Integer: Store integer -32,768~32767, 2 bytes. (Long Integer: 4 bytes) • Single: Store data with decimal, 4 bytes. (Double: 8 bytes).

Table Design (I-2) • Logical (Boolean): Store True/False. 1 byte. • Date: Store date/time. 8 bytes. • Memo: Store up to 64 MB text data. • General: Store up to 1 GB picture/sound. • Tip: Use the smallest field size because smaller file can be processed faster and require less memory.

Table Design (II-1) • Primary Key: Unique value to identify each record. • (in some RDBMS, such as FoxPro, using Record Number) • To eliminate duplicated records. • To build up the relationship with other tables. • Index: Assistant file to logically re-order the records. • (Ascending vs Descending) • To provide views in different order. • To speed up the record search by using SELECT, SEEK, FIND commands.

Table Design (II-2) • Primary /Index Key: Can be an expression of several fields (Compound key) • All the fields must be converted into same type • There is limit to the total length

Table Design (III-1) • Index and Sort (Cluster Index) • Index: Logically change the records' order • Sort: Physically change the records' order • Pros and Cons of indexes • Improve the query performance • Increase the work when the field is changed • Validation Rule: Specify the special requirement for the entered data

Table Design (III-2) • Security: Protect the database from illegal operations. Two Methods: • Assigning Password(SID, Security ID): To protect whole database from illegal access. • Adding Different User Group: To limit certain users within allowed operations: • Admins Group: Have full access to the database. • User Groups: Can assign different permissions to each group.

Table Design (IV) • Relationship: To link the tables. • Foreign Key: The key in parent table used to link the child table via its primary key. • Relation Types: One - Many, Many - One, One - One, Many - Many. • Referential Integrity: The rules to ensure the relationship between related tables are valid. • Basic integrity rule is: It not permitted a child record exist without its parent. • Three actions related to integrity: DELETE, INSERT, UPDATE.

Table Design (V-1) • Three integrity rules applied to the actions: • Cascade: Update all child records' foreign key with the new parent key primary value upon INSERT, UPDATE or DELETE all child records whose foreign key is same as the parent key value. • Restrict: Restrict the key value changes • To parent table: Prohibit changing the parent primary key to any existing foreign child key value. • To child table: Prevent changing child key to any non-existing parent key value. • Ignore: Perform no referential integrity checks.

Table Design (V-2) • Consistency, Replication and Synchronization. • (Master vs Replica)

SQL (Structured Query Language) -- A kind of fourth generation languages (4GLs) • First Generation Language: Machine code, a kind of binary language. Instructions are consisted with a series of "0" and "1". • Second Generation Language: Assembly languages. Allows to use simple alphabetic codes. • Third Generation Language: Procedure languages. Using structured English words to write the commands. Such as FORTRAN, BASIC, C++). • Fourth Generation Language: Non-procedure languages (SQL and OOP).

The difference between SQL and 3GLs • SQL only tell computer what kind of results are needed, not how to get the data. • One line of SQL command can have the same effects of multi-line procedure in 3GLs. • SQL is a WEAKLY DATA TYPED language. Don't need to specify the data type in SQL. The type of these data rely on the column expression and source data type.

Basic Elements of SQL Command • Verb: Such as SELECT. Determine the type of operation. • Predicate Variable List: Specify the variables affected by the operation. Normally is a list of fields or field expressions. • Prepositional Clause: Tell to which object the operation will apply. Normally a list of tables. Such as FROM.

Simplest SQL: • SELECT <variable list> FROM <table list> • SELECT StudentID, StudentName from StudentInfo

Practical Grammar of Simple SQL SELECT Statement • SELECT [ALL|DISTINCT] <variable list> • FROM <table list> • WHERE <criteria>

Practical Grammar of Simple SQL SELECT Statement • SELECT: Specify the operation is a query, i.e., extracting the data from the targeted database. Variable list can be consisted of: • Table fields (StudentID, StudentName) • Field expression (StreetNum+StreetName) • Wild character asterisk *. • ALL: Tell to get all the rows in the targeted database. DISTINCT can eliminate the duplicated rows. • FROM specify the targeted tables. • WHERE begins a clause that decides which records will be selected.

SQL Operators and Functions I • Arithmetic operators: +, -, *, /, ^(**), & • Comparison operators: • Basic: >, >=, =, <, <=, <>(!=) • Range: BETWEEN • Pattern: LIKE (with wild characters) • Wild characters • *(%) - Replace the character with any length of string. • ?( _ ) - Replace the character with any ONE character. • IN: x IN (v1,v2,v3,...). Whether expression x equals to any v? value.

SQL Operators and Functions II • The result of a comparison is a logical value (TRUE or FALSE) • Logical operators: Link the comparisons • AND: x AND y - x and y both TRUE. • OR: x OR y - Either x or y is TRUE. • NOT: NOT x - If x is FALSE, then NOT x is TRUE.

SQL Operators and Functions III • The Order of Operators and the Usage of Brackets • Arithmetic operators: ^(**) -- *, / -- +, -, & • Logical operators: NOT -- AND, OR • Use the brackets () to change the default order.

SQL Operators and Functions IV • SQL Aggregate Functions • AVG(x): Return the arithmetic average of the field expression x. • COUNT(x): Return the number of rows containing NOT NULL value of the field expression x. If x="*", it returns the number of non empty rows. • MIN(x): Return the smallest value of x. • MAX(x): Return the largest value of x. • SUM(x): Return the total value of x.

SQL Operators and Functions V • SELECT COUNT(*), AVG(mark), MIN(mark), MAX(mark) from StudentInfo • Some Other Useful Functions (There are some difference for different databases) • LEFT(string, length): Left part of a string. • RIGHT(string, length): Right part of a string. • MID(string, start, length): Middle part of a string. Some software using SUBSTR().

SQL Operators and Functions VI • Trim String: Get rid of tailing spaces. • RTRIM(string): Get rid of right end spaces. • LTRIM(string): Get rid of leading spaces. • TRIM(string): Get rid of both ends spaces. • Data Converting: Convert the data into required type. • CSTR(x) STR(x, len, dec) • CINT(y) VAL(y) • CDATE(d) CTOD(d)

SQL Operators and Functions VII • Constant Expressions • Numerical: No delimiter. (2500, 4.75) • String: Normally using single or double quotation as delimiters. • ('string', "STRING"). • Logical: Some software using dot as delimiters ( .TRUE. , .FALSE. ), some using no delimiter.

SQL Operators and Functions VIII • Date: Normally using number sign # as delimiters. • (#5/10/1998#) • Note: There are two formats of date: US format (mm/dd/yy) and UK format (dd/mm/yy). In SQL, if using # to express a date constant, then ONLY US format is accepted).

Join Tables I • CROSS JOIN: Return the a cross product of two tables. If Table1 has n rows and Table2 has m rows, then Table1 CROSS JOIN Table2 creates a table with n x m rows. • SELECT Publishers.Name, Authors.Author FROM Publishers, Authors • A very dangerous situation, must avoid • Equi-Join or INNER JOIN: Creates a table with the rows in which the key values exist in both tables. • Syntax1: SELECT <List> FROM Table1 INNER JOIN Table2 • ON Table1.key=Table2.key • Syntax2: SELECT <List> FROM Table1, Table2 WHERE Table1.key=Table2.key • SELECT Publishers.Name, Titles.ISBN, Titles.Title FROM Publishers INNER JOIN Titles ON ublishers.PubID=Titles.PubID

Join Tables II • Multiple Equi-Joins: If 3 or more tables are involved in SQL selection, then these tables need to be joined by pairs. • Syntax1: SELECT <List> FROM Table1 INNER JOIN (Table2 INNER JOIN Table3 ON Table3.key3=Table2.key2) ON Table2.key2=Table1.key1 ... ... ... • Syntax2: SELECT <List> FROM Table1, Table2, Table3, … WHERE Table1.key1=Table2.key1, Table2.key2=Table3.key3, ... ... ...

Inner Join Examples • SELECT Authors.Author, Titles.Title FROM Titles INNER JOIN (Authors INNER JOIN [Title Author] ON Authors.Au_ID = [Title Author].Au_ID) ON Titles.ISBN = [Title Author].ISBN; • SELECT Authors.Author, Titles.Title FROM Authors, Titles, [Title Author] WHERE Authors.Au_ID = [Title Author].Au_ID AND Titles.ISBN = [Title Author].ISBN;

Alias in SQL • Field alias • SELECT Field1 AS Alias1, Field2 As Alias2, .. • Table alias • FROM Table1 [AS] Alias1, Table2 [AS] Alias2, … • Example • SELECT A.Author AS [Author Name], B.Title AS [Book Name] FROM Authors A, Titles B, [Title Author] C WHERE A.Au_ID = C.Au_ID AND B.ISBN = C.ISBN;

Join Tables III • Outer Joins • Left Join • Table1 LEFT [OUTER] JOIN Table2: Returns all the rows in Table1 and only the rows of Table2 with matching key values. • Left outer join is useful to find "lone parent" records.

Join Tables IV • Right Join • Table3 RIGHT [OUTER] JOIN Table4: Returns all the rows in Table4 and only the rows of Table3 with matching key values. • Right outer join is useful to find "orphan" records.

Join Tables V • Not Equal Join: Find the records don't have the same key value between the tables. The result will be Cross Join minus Inner Join. Use <> in WHERE clause. • Self Join: Select records in a single table with two fields' value being same.

Join Tables VI • Sub-query • SELECT <List> FROM <Table1> WHERE <Key> IN (SELECT <Key> FROM <Table2> [WHERE <Criteria>]) • When using sub-query in WHERE clause, only one table's fields can be selected. • SELECT Name, PubID FROM Publishers WHERE PubID IN (SELECT PubID FROM Titles WHERE Title LIKE ‘* Database*’);

Merge SELECT Results • UNION [ALL] • To merge SELECT results • SELECT Statement1 UNION SELECT Statement2 … • ALL: Without this keyword, the records from two SELECT results will be de-duplicated. With this keyword, simply merge two results.

Other SQL Clause I • ORDER BY: Sort the query result in a specified order. • SELECT ... FROM ... WHERE … ORDER BY <Field1> [ASC|DESC], <Field2> [ASC|DESC], … • SELECT ... ORDER BY Lname, Fname

Other SQL Clause II • GROUP BY: Combine multiple records with identical values in the specific field list into a single record. It is a useful tool to produce summary reports. • SELECT ... FROM ... WHERE … GROUP BY <GroupList> • Two important notes: • The select field expression list must have at least one aggregate function. Actually, except the fields are specified in GROUP BY clause, all the other fields must be displayed in aggregate functions. • GroupList can have up to 10 fields which can be used to produce the summary record for sub sections.

Other SQL Clause III • HAVING <HavingCriteria>: Apply a filter ONLY to GROUP BY output • SELECT ... FROM ... WHERE … GROUP <GroupList> HAVING <HavingCriteria> • HAVING clause can decide which GROUPED ROWS to output. • WHERE clause provides a filter to decide which records are used in the summary report.

The Early Methods of Data Processing

The Early Methods of Data Processing

Presentation Transcript

signal processing methods

Processing the Data:

Early Counting Methods

Methods of data collection

Part 4: Data Dependent Query Processing Methods

Methods of presenting data

Methods of Data Processing

ARSF Data Processing Consequences of the Airborne Processing Library

Methods of Presenting Data

Processing the Data

Processing of Varian data

ORGANISATION OF THE MSDP DATA PROCESSING

Data Processing

Processing of 1999 data

Processing of Early Science Data

Processing of Data

Methods of Organizing Data

Best 3 Methods For Data Processing Services

Proven Methods of Data Collection in Data Processing

Data Processing

The processing of personal data (Personal Data Act)