1 / 45

Well, Sort-of

Well, Sort-of. Data, Data Everywhere *. The Sloan Digital Sky Survey started in 2000. In its first few weeks it collected more data than had been amassed the entire history of astronomy. By 2010, it had collected 140 terabytes of data.

slemire
Télécharger la présentation

Well, Sort-of

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Well, Sort-of

  2. Data, Data Everywhere * • The Sloan Digital Sky Survey started in 2000. In its first few weeks it collected more data than had been amassed the entire history of astronomy • By 2010, it had collected 140 terabytes of data • Its replacement, scheduled for 2016, will collect that amount of data every 5 days • In 2010, Walmart processed 1M customer transactions every hour • This equates to 2.5 petabytes, the equivalent of 167 times the books in the American Library of Congress • Facebook houses more than 40 billion photos * Excerpted from a Feb. 27th, 2010, Economist article

  3. Data, Data Everywhere * • Decoding the human genome involves 3 billion base pairs. • The first time it was attempted, it took 10 years • It can now be accomplished in 1 week. • It is estimated that within the next few years, the amount of global data created will approach 2,000 Exabytes per year (1 Exabyte = 1,000 Petabytes) • Problem: It is estimated that the total amount of storage available will be approximately 100 Exabytes * Excerpted from a Feb. 27th, 2010, Economist article

  4. Data, Data Everywhere * • Kilobyte = 210 bytes 1,024 bytes • One page of typed text typically requires 2K • Megabyte = 220 bytes 1,048,576 bytes • Storing the complete works of Shakespeare requires 5MB • Gigabyte = 230 bytes 1,073,741,824 bytes • A 2-hour film requires 1-2 GB • Tera(trillion)byte = 240 bytes 1,099,511,627,776 bytes • All of the books in the Library of Congress requires 15 TB • Peta(quadrillion)byte = 250 bytes 1,125,899,906,842,624 bytes • Google processes about 1 PB every hour • Exa(quintillion)byte = 260 bytes 1,152,921,504,606,846,976 bytes • Equivalent to 10 billion copies of the economist • Zetta(sextillion)byte = 270 bytes 1,180,591,620,717,411,303,424 bytes • The total amt. of information in existence is estimated at 1.2 ZB • Yotta(septillion)byte = 280 bytes 1,208,925,819,614,629,174,706,176 bytes * Excerpted from a Feb. 27th, 2010, Economist article

  5. What is Data Resource Management?? • A managerial activity that applies information systems technologies to the task of managing an organization’s data resources to meet the information needs of their business stakeholders What does that mean?? • It’s a very fancy way of saying that we are going to talk about databases

  6. A way we can model (parts of) the real world (well, Sort-of) What is a Database?? • A large, integrated collection of Data and Metadata • Entities (i.e., a person, place, object or event we wish to have information about). • Students • Physicians • Patients • Customers • TheAttributes of that entity (i.e., characteristics). • GPA • Specialty • Illness • Balance Due • TheRelationships between entities (i.e., how do entities interact). • One Physician has many Patients • A Patient has only one Physician

  7. What is it, really??  Consider some information the University maintains: Name Major Tuition Paid Address Courses Taken Tuition Owed SSN Grades Received Grants/Scholarships  HOW is this information stored? You are an entity with attributes which vary. Within the University, different areas have different interests in you (i.e., the Registrar, the Bursar, etc.). Nonetheless, you are still part of the University as a whole.

  8. How does this relate to a database? You are an entity class(student) Table with attributes Fields which vary Your attributes can be different Within the University, differentareas, have different interests in you Files () (i.e,. The Registrar, Bursar, etc.) Nonetheless, you are still part of the University Database

  9. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Hernandez, Juan 123456789 72 2.42 HOW does this relate to a database? Hierarchically: ADatabaseconsists of Files, whichcontain Records, whichcontain Jones, Mary 234567890 102 3.87 Fields, whichmay consist of a variety of data types Notice that there should always be a Key (Unique) Field

  10. Alternatively (from smallest to largest component): • Character:A single alphabetic, numeric or other symbol • Field:A group of related characters • Entity:A person, place, object or event • Attribute:A characteristic of an entity • Record:A collection of attributes that describe an entity • File:A group of related records • Database:An integrated collection of logically related data elements

  11. Logical Data Elements:

  12. Why Databases?? Databases were not always commonplace  Initially, there were no databases or DataBase Management Systems (DBMS)  Individual Applications were written to meet specific user needs (File Processing or Traditional File Processing Systems)  As business applications became more complex, it became apparent that there were too many problems associated with Traditional Processing Systems

  13. What Problems??  Single Applications • A program was written for (generally) oneand onlyone application (The user would specify their individual needs)  Program-Data Dependence • Since each program was written for a specific data set, a change in the data, or data format, required a change in the program which uses the data

  14. What Problems??  Data Redundancy • duplicate data requires an update to be made to all files storing that data  Lack of Data Integration • data stored in separate files require special programs for output making ad hoc reporting difficult  Data Input Errors • If more people are required to enter data, the likelihood that errors/mis-entered data will be stored is increased

  15. How did this work??

  16. How did databases come about??  1960’s: North American Rockwell’s Moon Project • > 60% of all data used was duplicated in multiple data sets (redundancy)  By the Mid 1960’s: • Rockwell/IBM Joint Venture to develop a DataBase Management System (DBMS) • Hierarchical in Nature  Later: • IBM’s Information Management System (IMS)

  17. How are databases different??  Database Management Approach • Consolidates data records into one database that can be accessed by many different application programs. • Software interface between users and databases • Data definition is stored once, separately from application programs

  18. How are databases different??  Database Management Approach

  19. What is a DBMS??  Software that controls the creation, maintenance, and use of databases

  20. What does a DBMS consist of??

  21. What are the major functions of a DBMS ???  Database Development: • Defining and organizing the content, relationships and structure of the data needed to build the database • Specifying integrity constraints • Fixing of Access Rights (Authorization)

  22. Places Contain Parts What are the major functions of a DBMS ???  Database Development: Entity Relationship Diagrams  Consider the following situation A customer places an order. The order consists of parts. Entity Relationship Relationship Orders Customer An Organization about which we wish to maintain information An Association between Entities Entity

  23. What are the major functions of a DBMS ???  Database Maintenance: • Updating a database continually to reflect new business transactions and other events • Updating a database to correct data and ensure accuracy of the data

  24. What are the major functions of a DBMS ???  Database Interrogation: • Capability of a DBMS to report information from the database in response to end users’ requests • Query Language: allows easy, immediate access to ad hoc data requests • Report Generator: allows quick, easy specification of a report format for information users have requested

  25. What are the major functions of a DBMS ???  Database Interrogation: • Natural Language vs. SQL Queries

  26. What are the major functions of a DBMS ???  Application Development: • End users, systems analysts, and other application developers can use the internal 4GL programming language and built-in software development tools provided by many DBMS packages to develop custom application programs.

  27. What are the forms of a DBMS ??? Hierarchical: relationships between records form a hierarchy or treelike structure Network: data can be accessed by one of several paths because any data element or record can be related to any number of other data elements Relational: All data elements within the database are viewed as being stored in the form of simple tables

  28. StudentID Name Address Major 123456789 Saenz, Lupe 123 Mesa Finance 234567890 Chung, Mei 37 5th St. INFOSYS 345678901 Adams, John 54B Hague Accounting 456789012 Elam, Mary 123-22 E St. INFOSYS •••••• •••••• •••••• •••••• What are the forms of a DBMS ??? RDBMS Table Student Field Names Record Field

  29. Faculty Student •••••• Owed Department Depart 987654321 103456678 1,502.36 •••••• Finance Marketing StudentID Name Address Major 123456789 Saenz, Lupe 123 Mesa Finance 123456789 876543210 COBA219 •••••• Finance INFOSYS 234567890 Chung, Mei 37 5th St. INFOSYS 345678901 Adams, John 54B Hague Accounting •••••• •••••• •••••• •••••• •••••• •••••• 456789012 765432109 •••••• COBA232 Accounting Accounting 456789012 Elam, Mary 123-22 E St. Accounting •••••• •••••• •••••• •••••• What are the forms of a DBMS ??? Table Student RDBMS Table Balance Table Department

  30. What are the forms of a DBMS ??? Multidimensional Database Structure • Variation of the relational model that uses multi-dimensional structures to organize data and express the relationships between data

  31. What are the forms of a DBMS ??? Object-Oriented Database Structure • Can accommodate more complex data types including graphics, pictures, voice and text

  32. What are the forms of a DBMS ??? Object-Oriented Database Structure  Encapsulation: • data values and operations that can be performed on them are stored as a unit • Conceals the exact details of how a particular class works from objects that use its code or send messages to it  Inheritance: • automatically creating new objects by replicating some or all of the characteristics of one or more existing objects

  33. How do the DBMS structures compare ??? (These arte your authors’ viewpoints) Hierarchical: best for structured, routine types of transaction processing. Network: best when many-to-many relationships are needed Relational: best when ad hoc reporting is required.

  34. How are databases developed ??? Database Development: Enterprise-wide database development is usually controlled by database administrators (DBA)  Data Planning: • Database administrators and designers work with corporate and end user management to develop an enterprise model that defines the basic business process of the enterprise.

  35. How are databases developed ???  Logical Schema: • data elements and relationships among them  Physical Schema: • describes how data are to be stored and accessed on the storage devices of a computer system • Data Dictionary: catalog or directory containing metadata

  36. How are databases developed ??? Logical vs. Physical Designs:

  37. How are databases used??? Types of Databases:

  38. How are databases used??? Types of Databases: • Operational:store detailed data needed to support the business processes and operations of a company  Subject Area DataBases (SADB), Transaction Databases, Production Databases  Customer databases  Inventory databases  Human Resources databases

  39. How are databases used??? Types of Databases: • Distributed:databases that are replicated and distributed in whole or in part to network servers at a variety of sites  A single logical database that is spread across computers at multiple locations  Replicated databases  Partitioned databases  Challenges: ensuring that data is constantly, consistently and concurrently updated

  40. How are databases used??? Types of Databases: • External:contain a wealth of information available from commercial online services and from many sources on the World Wide Web  Commercial/Shareware/Freeware  Internet dominated

  41. How are databases used??? Types of Databases: • Hypermedia:consist of hyperlinked pages of multimedia

  42. How are databases used??? Types of Databases: Data Warehouses • Large database that stores data that have been extracted from the various operational, external, and other databases of an organization

  43. How are databases used??? Types of Databases: Data Marts • Databases that hold subsets of data from a data warehouse that focus on specific aspects of a company, such as a department or a business process

  44. How are databases used??? Types of Databases: Data Mining Uses: • Perform “market-basket analysis” to identify new product bundles. • Find root causes to quality or manufacturing problems. • Prevent customer attrition and acquire new customers • Cross-sell to existing customers • Profile customers with more accuracy

  45. QUESTIONS???

More Related