1 / 10

Data Preprocessing - Dept. Of Computer Engineering

This presentation explains what is the meaning of data processing and is presented by Prof. Sandeep Patil, from the department of computer engineering at Hope Foundation’s International Institute of Information Technology, I2IT. The presentation talks about the need for data preprocessing and the major steps in data preprocessing. You will also find information on Data Transformation and Data Discretization.

Télécharger la présentation

Data Preprocessing - Dept. Of Computer Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Preprocessing An Overview By SandeepPatil, Department of Computer Engineering, I²IT

  2. Outline • What is Data Preprocessing ? • Major Steps in Data Preprocessing • Data Cleaning • Data Integration • Data Reduction • Data Transformation and Data Discretization • Conclusion International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  3. Why Data Preprocessing? • Need of data preprocessing Some part of Data may have problems like • Incomplete (absence of data) • Inaccurate or noisy (other than expected values) • Inconsistent (containing discrepancies) • Timeliness (old version of data) • Believability (users faith in the correctness of the data) • Interpretability (simplicity in understanding the data) International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  4. Major Steps in Data Preprocessing International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in Data Cleaning Data Integration Data Reduction Data Transformation

  5. Data Cleaning • Filling Missing values • Smoothing • Remove Noisy data • Identifying or removing outliers • Resolving inconsistencies. International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  6. Data Integration International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in • Entity Identification Problem • Integrating multiple databases, data cubes, or files • Redundancy and Correlation Analysis • Tuple Duplication - updating some but not all data occurrences. • Data Value Conflict Detection and Resolution - for the same real-world entity, attribute values from different sources may differ

  7. Data Reduction • To obtain a reduced representation of the data set that is much smaller in volume • NumerosityReduction • - Parametric methods • eg. Regression and log-linear models etc. • - Nonparametric methods • eg. Histograms, clustering, sampling etc. • Data Compression • - lossless • - lossy International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  8. Data Transformation and Data Discretization • Data are transformed or consolidated into forms appropriate for mining • - Smoothing • - Attribute construction or feature construction • - Aggregation, • - Normalization • - Discretization • - Concept hierarchy generation International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  9. Conclusion • Although numerous methods of data preprocessinghave been developed, data preprocessing remains an active area of research, due to the huge amount of inconsistent or dirty data and the complexity of the problem. International Institute of Information Technology, I²IT, P-14 Rajiv Gandhi Infotech Park, MIDC Phase 1,Hinjawadi, Pune - 411 057 | Tel +91 20 22933441 | Website - www.isquareit.edu.in | Email - info@isquareit.edu.in

  10. THANK YOU For further information please contact Prof. Sandeep Patil Department of Computer Engineering Hope Foundation’s International Institute of Information Technology, I²IT Hinjawadi, Pune – 411 057 Phone - +91 20 22933441 www.isquareit.edu.in | sandeepp@isquareit.edu.in

More Related