1 / 24

Data quality

Data quality. Stefano Grazioli. Last SQL homework due Fixed demo issues. Added a note to the homework text Easy Meter. Critical Thinking. What is Data Quality?. The degree to which data is suitable for a business purpose Accuracy, precision.

jeffk
Télécharger la présentation

Data quality

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data quality Stefano Grazioli

  2. Last SQL homework due Fixed demo issues. Added a note to the homework text Easy Meter Critical Thinking

  3. What is Data Quality? • The degree to which data is suitable for a business purpose • Accuracy, precision

  4. The quality of the data stored in organizational databases is often poor • 10-25% of the records have inaccuracies or missing elements • Data frequently misinterpreted • Known data loss and theft • Most databases implement inconsistent definitions Source: T. Redman, Data Driven, 2008

  5. Why is Data Bad? No one gets up in the morning and says“I’m going to make lots of errors today” - Cathy Bessant Source: T. Redman, Data Driven, 2008

  6. Find the Data Quality issues

  7. Find the Data Quality issues

  8. Approaches to Data Quality Find and Fix Prevent at the source Do nothing

  9. Homework Business Scenario:Google’s Daily Cagr

  10. You are a financial analyst at a fintech firm Many of our customers invest for short amounts of time on Google. They sell their shares within a few weeks…. I wonder: do they make any money out of it? I am on it….. While you are at it… clean the data, first Consider it done.

  11. You get a file with~1000customers whorecently boughtand sold GOOG. Three steps (and two homework) Clean data: phones, dates Compute Daily Cagr = [(final price/initial price)1/days ]-1 Report the Average Daily Cagracross all customers. Daily Cagr for Google

  12. From:#2345348565 To:(234)-534-8565 Cleaning Phone Numbers

  13. UML Activity Diagram - Daily Compound Average Growth of a Security (part I) When the user presses a button, a file selection windows pops out. The user selects a file. The file is shown starting at cell “A1”. The start button becomes invisible. Three more buttons appear: “Clean phone numbers”, “Format Dates”, and “Compute Daily CAGR”. A Next homework Next homework [Compute] [Format Dates] [Clean ph.no] Select the next phone no. Count its digits [Exactly 10 digits] Format as(xxx)-xxx-xxxx & clear highlight if any Highlight the cell in red A [No More Ph.No]

  14. WINIT What Is NewIn Technology?

  15. used in data quality and beyond Text manipulation

  16. DimmyString AsString=“This is a sample string" DimmyString2 AsString = "s" DimmyChar AsChar = "s"c Strings and Characters

  17. DimmyString AsString = "#2344-234-33-3" Dimtemp AsString = "" ForEach x AsCharIn myString If IsNumeric(x) Then temp = temp + x EndIf Next Testing Numbers

  18. DimmyStrAsString = "This is a sample string" myStr= myStr.Insert(4, "xyz") myStr= myStr.Remove(4, 3) 'starting where,howmany myStr= myStr.Replace(" is", " was") Inserting and Removing

  19. Composing text Dim sAsString = “4344562456” Dimtemp AsString temp = "Ph. " + s.Substring(0, 3) + " / "+s.Substring(3, 3) + " " + s.Substring(6, 4) ' The above is an example that produces this: ‘ Ph. 434 / 456 2456 ‘ or - same thing - temp = String.Format("Ph. {0} / {1} {2}", s.Substring(0, 3), s.Substring(3, 3), s.Substring(6, 4))

  20. myLenght = myString.Length myNewString = myString.Trim() myNewString = myString.TrimEnd() myNewString = myString.TrimStart() myNewString = myString.PadLeft(50) myNewString = myString.PadRight(20) Trimming and Padding Total length of the result

  21. ' store the address of the current active sheet (the ‘target’) DimmyActiveS AsExcel.Worksheet = Application.ActiveSheet ' select a file DimmyFile AsString= Application.GetOpenFilename() 'get the data in a new temporary workbook Application.Workbooks.OpenText(myFile, , , Excel.XlTextParsingType.xlDelimited, , , , , True) ' store the address of the temporary workbook DimmyActiveWB As Excel.Workbook = Application.ActiveWorkbook ' copy the content from the temporary to the ‘target’ sheet myActiveS.Range("A1:J1000").Value = Application.ActiveSheet.Range("A1:J1000").Value ‘ close the temp workbook myActiveWB.Close() Reading a File into EXCEL

  22. Dim lastRow As Integer lastRow = Cells(Rows.Count,1). End(Excel.XlDirection.xlUp). Row Finding the last non-empty row

  23. Suggestions • Give yourself plenty of time

More Related