1 / 26

Data Cleansing

Data Cleansing.

yitta
Télécharger la présentation

Data Cleansing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Cleansing

  2. “A company’s most important asset is information. A corporation’s ability to compete, adapt, and grow in a business climate of rapid change is dependent in large measure on how well the company uses information to make decisions … Sharing information that isn’t clean and consolidated to the fullest extent can substantially reduce the effectiveness of a system of significant investment and considerable pay-off potential.” Stoker, 1999

  3. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Today’s Coverage INTRODUCTION • Data Cleansing and Data Quality • Steps in Data Cleansing • Why is “Dirty” Data a Problem? • Why is Legacy Data “Dirty”? • To Cleanse or Not To Cleanse • Parsing • Matching • Correcting • Consolidating • Standardizing • Conclusion • Demonstration • Questions WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION

  4. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Data Cleansing and Data Quality • Data is a product that can be characterized as either “quality” or “non-quality.” The ability to make quality decisions depends in part on the decision-maker’s ability to access quality data. • Data cleansing is the process that insures that the same piece of information is referred to in only ONE way. When data is clean, its users can focus on its use and not its credibility.

  5. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Steps in Data Cleansing • Parsing • Correcting • Standardizing • Matching • Consolidating

  6. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Why is Data “Dirty” and Why is This a Problem? • Simply put, dirty data for data warehouses is the product of relying on data from legacy systems. • But if company’s have relied on this data for decades, why is it a problem today? • Because a data warehouse “promises” to deliver “a single version of the truth.” Unfortunately integrating data from different sources magnifies its problems.

  7. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Why is Legacy Data “Dirty” ? • Dummy Values, • Absence of Data, • Multipurpose Fields, • Cryptic Data, • Contradicting Data, • Inappropriate Use of Address Lines, • Violation of Business Rules, • Reused Primary Keys, • Non-Unique Identifiers, and • Data Integration Problems

  8. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION To Cleanse or Not to Cleanse • CAN the legacy data be cleansed? • Sometimes the answer is “NO” • Then, SHOULD it be cleansed? • Again, sometimes “NO” • Next, WHERE should it be cleansed? • Finally, HOW should it be cleansed?

  9. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Steps in Cleansing Data • Parsing • Correcting • Standardizing • Matching • Consolidating

  10. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Parsing Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.

  11. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Parsing

  12. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Correcting Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.

  13. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Correcting

  14. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Standardizing Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.

  15. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Standardizing

  16. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Parsing, Correcting, Standardizing TITLE FIRST CONC. LAST GENER. NAME LINE William Mr. Bill St. John III 101 S. Main Strete Sant. Louis, MO 63181 HSNO ST-DIR ST-NM ST-TYPE St. STREET LINE CITY STATE POST St. 63118 GEOG. LINE

  17. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Matching Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.

  18. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Match Patterns Customer #/Tax ID Branch Type Vendor Code Pattern I.D. Business Name Street City Pattern Exact Exact Exact Exact Exact Exact AAAAAA P110 Exact VClose Exact VClose Exact Blanks ABAAA- P115 Exact VClose Exact Blanks Exact Exact ABA-AA P120 Exact VClose Close Close Exact Exact ABCCAA S300 VClose VClose Exact Close Exact Exact BBACAA S310

  19. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Matching

  20. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Consolidating Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.

  21. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Consolidating

  22. William Jones Janet Jones Karen Jones William Jones Jr. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Consolidating

  23. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Recommended Best Practices 1. Use metadata to document rules 2. Determine data cleansing schedule 3. Build quality into new and existing systems

  24. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Legacy Systems View (3 Clients) Account No.83451234 Policy No.ME309451-2 Transaction B498/97

  25. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION The Reality – ONE Client Account No.83451234 Policy No.ME309451-2 Transaction B498/97

  26. INTRODUCTION WHY “DIRTY” DATA CLEANSING STEPS CONCLUSION Demonstration • Valityhttp://www.vality.com • Trillium Software http://www.trilliumsoft.com • First Logichttp://www.firstlogic.com

More Related