1 / 34

CIS 310 Management Information Systems Database Refresher

CIS 310 Management Information Systems Database Refresher. Database Refresher – Mental Model. Database = File Cabinet Records = File Folder Data = Contents of the folder Database = Student Records Table = A-G in the top Drawer Record = BID 1234567890

evan
Télécharger la présentation

CIS 310 Management Information Systems Database Refresher

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIS 310 Management Information SystemsDatabase Refresher

  2. Database Refresher – Mental Model Database = File Cabinet Records = File Folder Data = Contents of the folder Database = Student Records Table = A-G in the top Drawer Record = BID 1234567890 Attribute, Entity or Data Item = Student Name, Address, GPA, email…etc.

  3. Relating Tables Student Data Enrolled In, Winter 2013 BID (unique) Class Number 1 Class Number 2 Class Number 3 Class Number 4…etc. • BID (unique) • First Name • Last Name • Address • City • State • Zip • Phone • eMail Classes in Winter 2013 • Class No. (unique each term) • Dept. No. • Class Name • Units • Time • Professor

  4. Relational Database Management System (RDBMS) • Software that helps you to link the tables and perform reporting and queries on the data in the database. • Access (CIS 101) • Sybase • Oracle • SQL Server (Microsoft) • DB2 (IBM)

  5. Entity Relationship Diagram (ERD) • Design tool used to design and plan databases. • Primary key Table Attributes

  6. ERDs Continues Relationships • One-to-one • One-to-many • Many-to-many 1 1 1

  7. ERD Example (engotzz.blogspot.com)

  8. Bookstore ERD from jdonohue.com

  9. End • Is LastName a good primary key if you’re just using a small database for a class? • Name three tables that would be in a databasefor pet adoption. • What would data attributes for a pet adoption database be for a table called animals? No. People have the same last name. Not expandable. Animals NewOwnersShotHistory petID, Breed, Age, Name, KidFriendly, OtherPetFriendly, Color…etc.

  10. CIS 310 Management Information SystemsData Warehousing, Data Marts, Data Integrity

  11. Example: Rensselaer Polytechnic Institute Admissions Data Warehouse • Attract the best and brightest and retain diversity, balance, geography and manage financial aid. • Results • Invested $1.2 million. Costs $537,000 annually to operate. • Savings in improved data analysis $820,000 annually • Savings in financial aid $500,000 annually • Savings in labor for reporting $320,000 annually Source: Information Week, 2007

  12. Example 2 – Cal Poly Data Warehouse

  13. Data Warehouse • Collection of data from several databases to support business analysis. • Aggregate lots of data • Internal and external sources • Drill down capability • Benefit • Focus on managerial decision making instead of operational decision making. • Provide insight not available before because data was never connected before.

  14. datawarehouse4u.info External Data Sources

  15. Data Mart: Subset of DW (Gdwsolutions.com)

  16. Data Cube (Multidimensional Analysis) • Allows you to look at the data from different dimensions to perform analysis. Campaign A Campaign B Campaign C Store A Store B Store C Store D Store E Product A Product B Product C Product D

  17. Slicing an Dicing Campaign A Campaign B Campaign C Campaign A Campaign B Campaign C Store A Store B Store C Store D Store E Store A Store B Store C Store D Store E Product A Product B Product C Product D Product A Product B Product C Product D

  18. Questions • What store is the most productive and what products are the best sellers at those stores? • What clinics need the most blood during which time of year? • Which advertising campaign was most productive in which areas? • What elements are decisively different between my worst performing and best performing store?

  19. Information Granularity • Yard  foot  inches • All sales  sales per region  sales per store • Kids with the flu  kids with the flue by region  kids with the flue by region and age. • Granularity is how far you can dig into the detail of the data.

  20. Data Integrity • Not all data is ‘clean’. Sometimes data is erroneousor incomplete. • You do not want to make a decision based upon bad data. • High quality data can lead to better, or a least more informed, decisions.

  21. 5 Characteristics of High Quality Information • Accuracy – the data is correct. • Completeness – all the data needed is there. • Consistency – data is uniform. A phone number has 10 characters, never any other length. • Uniqueness – To have value, the data must uniquely inform the company. • Timeliness – New and current data is better for current decision making.

  22. Data Scrubbing • ‘Cleaning’ the data to get rid of incomplete, inconsistent or erroneous data. • Missing data or attributes • Redundant records • Missing keys • Erroneous records • Incorrect data • Ex. How many fake accounts have you set up to try software for free? • Ex. How many times did you sign up for something and then never return?

  23. Information Accuracy Costs $$ • The more complete and accurate the data is, the higher it will cost. • It takes resources to collect, verify and fix data. • Another question – what is the costs of not having high-quality data? • Having to redo things. • Making bad decisions. • Process failure.

  24. Getting the Right Data In • Online forms designed to prevent errors. • @ to check if it is an email address or to email the account before activation. • Form fields required. * Don’t let the user continue until they fill out everything. • Form fields of a specific format or length. Zip code has to be 5 digits or it is rejected.

  25. End • Is a monthly sales report an example of highly granular information? • What is that OLAP thingy? • Why would anyone use a data mart instead of the data warehouse? No. It is more granular than a yearly sales report and less granular than a daily sales report. Data cube…with the ability to slice and dice. A data mart is a subset of a data warehouse, used for a more focused purpose. You would use the mart to make your analysis faster and maybe easier.

  26. Data Mining & Data Analysis Winter, 2013

  27. Data Mining Use a variety of techniques to uncover interesting things about the data • Cluster analysis • Association Detection • Statistical Analysis

  28. Structured vs. Unstructured Data • Structured data is already in a database or spreadsheet format. • .mdbx or .xlsx • Unstructured data doesn’t have an organized format to it. • Photos • Music • Pdf memos. • Emails.

  29. Cluster Analysis • Grouping a set of objects in a way that clusters form around certain attributes. • Ex. Zip code clustering can show where most sales, customers..etc. are from. • Ex. Social media cluster analysis may predict what words are more likely to be next to each other. • Music sales is directly linked to buzz on social media. • Mapped into chart where clusters form and grow on different words, predicting success.

  30. Association Detection • Market Basket Analysis (also known as commodity bundle) – What is in your basket and how is it related/predictive? • A student purchasing engineering books might also need a calculator. • Amazon suggesting ‘customers who bought this also bought that.’ May entice you to purchase more books.

  31. Statistical Analysis • Forecasting & Time Series • Data can be collected at specific intervals to gain predictive insight to it. • Ex. stock prices, power consumption, sales over time in response to a marketing campaign. • Data could indicate seasonal or cyclic trends.

  32. Other Mining Opportunities • Text Mining – • Searching through a massive number of emails for a company. • Searching twitter data. • Web Mining • Look at people’s browsing and buying or navigation habits.

  33. So what is BI again? • It is a set of processes and analysis tools used to examine data and get something great from it. • BI and Big Data is booming. There are lots of massive data sets and we are at the beginning of understanding how to gain insight from all that data.

  34. End

More Related