1 / 13

Instance Construction via Likelihood-Based Data Squashing

Instance Construction via Likelihood-Based Data Squashing. Madigan D., et. al . (Ch 12, Instance selection and Construction for Data Mining (2001), Kruwer Academic Publishers) Summarize: Jinsan Yang, SNU Biointelligence Lab. Abstract Data Compression Method: Squashing

Télécharger la présentation

Instance Construction via Likelihood-Based Data Squashing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Instance Construction via Likelihood-Based Data Squashing Madigan D.,et. al. (Ch 12, Instance selection and Construction for Data Mining (2001), Kruwer Academic Publishers) Summarize: Jinsan Yang, SNU Biointelligence Lab

  2. Abstract • Data Compression Method: Squashing • LDS: Likelihood based data squashing • Keywords Instance Construction, Data Squashing

  3. Outline • Introduction • The LDS Algorithm • Evaluation: Logistic Regression • Evaluation: Neural Networks • Iterative LDS • Discussion

  4. Introduction • Massive data examples • Large-scale retailing • Telecommunications • Astronomy • Computational biology • Internet logging • Some computational challenges • Need of multiple passes for data access • 10^5~6 times slower than main memory • Current Solution:Scaling up existing algorithm • Here: Scaling down the data • Data squashing: 750000  8443 ( DuMouchel et al (1999), • Outperforms by a factor of 500 in MSE than random sample of size 7543

  5. LDS Algorithm • Motivation: Bayesian rule • Given three data points d1,d2,d3, estimate the parameter : • Clusters by likelihood profile:

  6. LDS Algorithm • Details of LDS Algorithm • [Select] Values of by a central composite design Central composite Design for 3 factors

  7. LDS Algorithm • [Profile] Evaluate the likelihood profiles • [Cluster] Cluster the mother data in a single pass • Select n’ random samples as initial cluster centers • Assign the remaining data to each cluster • [Construct] Construct the Pseudo data: • cluster center

  8. Evaluation: Logistic Regression • Small-scale simulations: • Initial estimate of • Plot: Log (Error Ratio) • Three methods of initial parameter estimations • 100 data / 48 squashed data

  9. Evaluation: Logistic Regression • Medium Scale: 100000 , base: 1% simple random sampling

  10. Evaluation: Logistic Regression • Large Scale: 744963 , base: 1% simple random sampling

  11. Evaluation: Neural Networks • Feed forward, two input nodes, one hidden layer with 3 units, Single binary output • Mother data: 10000, Squashed data: 1000, repetitions:30 test data: 1000 from the same network • Comparisons for P(whole) - P(reduced)

  12. Evaluation: Neural Networks

  13. Iterative LDS • When the estimation of is not accurate. 1. Set from simple random sampling • 2. Squash by LDS • 3. Estimate • 4. Go to 2.

More Related