1 / 22

Near-Optimal Scalable Feature Selection

Near-Optimal Scalable Feature Selection. Siggi Olafsson and Jaekyung Yang Iowa State University INFORMS Annual Conference October 24, 2004. Feature Selection. Eliminate redundant/irrelevant features Reduced dimensionality Potential benefits: Simpler models Faster induction

Télécharger la présentation

Near-Optimal Scalable Feature Selection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Near-Optimal Scalable Feature Selection Siggi Olafsson and Jaekyung Yang Iowa State University INFORMS Annual Conference October 24, 2004

  2. Feature Selection • Eliminate redundant/irrelevant features • Reduced dimensionality • Potential benefits: • Simpler models • Faster induction • More accurate prediction/classification • Knowledge obtained from knowing which features are important 2004 INFORMS Annual Conference

  3. Measuring Feature Quality • Find the subset F of features that maximizes some objective, e.g., • Correlation measures (filter) • Accuracy of classification model (wrapper) • Information gain of filter, gain ratio, etc. • No single measure that always works best 2004 INFORMS Annual Conference

  4. Optimization Approach • Combinatorial optimization problem • Feasible region is {0,1}m, where m is number of features • NP-Hard • Previous optimization methods applied • Branch-and-bound • Genetic algorithms & evolutionary search • Single pass heuristics • Also been formulated as mathematical programming problem 2004 INFORMS Annual Conference

  5. New Approach: NP Method • Nested Partitions (NP) method: • Developed for simulation optimization • Particularly effective for large-scale combinatorial type optimization problems • Accounts for noisy performance measures 2004 INFORMS Annual Conference

  6. NP Method • Maintains a subset called most promising region • Partitioning • Most promising region partitioned into subsets • Remaining feasible solutions aggregated • Random Sampling • Random sample of solutions from each subset • Used to select the next most promising region 2004 INFORMS Annual Conference

  7. Backtrack to previous All subsets Move to best subregion Feature a1 included a1 not included Feature a2 included a2 not included Feature a3 included a3 not included Partitioning Tree Current most promising 2004 INFORMS Annual Conference

  8. Intelligent Partitioning • For NP in general • Partitioning imposes a structure on the search space • Done well the algorithm converges quickly • For NP for feature selection • Partitioning defined by the order of feature • Select most important feature first, etc • E.g., rank according to the information gain of the features (entropy partitioning) 2004 INFORMS Annual Conference

  9. Test Data Sets • Test data sets from UCI Repository 2004 INFORMS Annual Conference

  10. How Well Does it Work? • Comparison between NP and another well known heuristic, namely genetic algorithm (GA) 2004 INFORMS Annual Conference

  11. How Close to Optimal? • So far, this is a heuristic random search with no performance guarantee • However, the Two-Stage Nested Partitions (TSNP) can be shown obtain near optimal solutions with high probability • Assure that ‘correct choice’ is made with probability at least y each time • Correct choice means within an indifference zone d of optimal performance 2004 INFORMS Annual Conference

  12. Two-Stage Sampling • Instead of taking a fixed number of samples from each subregion, use statistical selection, e.g. Rinott: Sample variance estimated from 1st phase Sample points in 1st phase Number of samples needed from j-th region in iteration k Constant determined by the desired probability ψ of selecting the correct region 2004 INFORMS Annual Conference

  13. Performance Guarantee • When maximum depth is reached where 2004 INFORMS Annual Conference

  14. Scalability • The NP and TSNP were originally conceived for simulation optimization • Can handle noisy performance • More sample prescribed in noisy regions • Incorrect moves are corrected through the backtracking element (both NP and TSNP) • Can we use a (small) subset of instances instead of all instances? • This is a common approach to increase scalability of data mining algorithms, but is it worthwhile here? 2004 INFORMS Annual Conference

  15. Numerical Results: Original NP 2004 INFORMS Annual Conference

  16. Observations • Using a random sample can improve performance considerably • Evaluation of each sample feature subset becomes faster • Very small sample degenerates performance • There is now too much noise and the method backtracks excessively → more steps • The TSNP would prescribe more samples! • The expected number of steps is constant • What is the best fraction R of instances to use in the TSNP? 2004 INFORMS Annual Conference

  17. Optimal Sample for TSNP • If we decrease the sample size, then the computation for each sample point decreases • However, the sample variance increased and more sample points will be needed • To find approximate R*, we thus minimize Number of sample points needed in each step Computation time given the number of sample point needed 2004 INFORMS Annual Conference

  18. Approximating the Variance • Now it can be shown that that 2004 INFORMS Annual Conference

  19. Optimal Sampling Ratio • Now obtain the optimal sampling ratio • The constants c0, c1, c2 are estimated from the data, and h, l and d are determined by user preferences 2004 INFORMS Annual Conference

  20. Numerical Results * Statistically better than TSNP w/sampling 2004 INFORMS Annual Conference

  21. Conclusions • Feature selection integral in data mining • Inherently a combinatorial optimization problem • From scalability standpoint it is desirable to be able to deal with nosy data • Nested partitions method • Flexible performance guarantees • Allows for effective use of random sampling • Very good performance on test problems 2004 INFORMS Annual Conference

  22. References • Full papers available: • S. Ólafssonand J. Yang (2004).“Intelligent Partitioning for Feature Selection,”INFORMS Journal on Computing, in print. • S. Ólafsson (2004). “Two-Stage Nested Partitions Method for Stochastic Optimization,”Methodology and Computing in Applied Probability, 6, 5-27. • J. Yang and S. Ólafsson (2004). “Optimization-Based Feature Selection with Adaptive Instance Sampling,”Computers and Operations Research, to appear 2004 INFORMS Annual Conference

More Related