1 / 4

CSC 478 Programming Data Mining Applications Course Summary

CSC 478 Programming Data Mining Applications Course Summary. Bamshad Mobasher DePaul University. What we did. Data Mining Overview The KDD Process Data Preprocessing and Understanding Using Python and Numpy Using Scikit -learn modules

inge
Télécharger la présentation

CSC 478 Programming Data Mining Applications Course Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 478Programming Data Mining ApplicationsCourse Summary Bamshad Mobasher DePaul University

  2. What we did • Data Mining Overview • The KDD Process • Data Preprocessing and Understanding • Using Python and Numpy • Using Scikit-learn modules • Some emphasis on visualizing and understanding characteristics of the data • Supervised Knowledge Discovery • Regression Analysis • Classification • Techniques such as KNN, Ridge Regression, Decision Tree and Bayesian classification • Lots of emphasis on model evaluation • Evaluation metrics • Train-Test methodologies such as cross-validation

  3. What we did • Unsupervised Knowledge Discovery • Cluster analysis • Using PCA and SVD for dimensionality reduction, data characterization, and noise reduction. • Association rule discovery • Emphasis on using unsupervised approaches as components of larger knowledge discovery efforts • E.g., using PCA before clustering; using clustering as the basis for classification • Real application domains • Text Mining and document analysis/filtering • Recommender systems • Predictive modeling for marketing/business applications • Image analysis

  4. What we did not do(and you should learn later) • Approaches for mining sequential/temporal data • Markov models; time series analysis, sequential pattern mining • Ensemble and Hybrid Classifiers/Predictors • Combining multiple classifiers • Random Forest classifiers • AdaBoost and meta-learners • Support Vector Machines and Kernel-Based Classifiers • Topic modeling with Latent factor models • LDA  Latent Dirichlet Allocation • Non-Negative Matrix Factorization

More Related