190 likes | 418 Vues
Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan, cling}@csd.uwo.ca. Outline. Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions. Introduction. Objective:
E N D
Machine Learning for Stock SelectionRobert J. YanCharles X. LingUniversity of Western Ontario, Canada{jyan, cling}@csd.uwo.ca
Outline • Introduction • The stock selection task • The Prototype Ranking method • Experimental results • Conclusions
Introduction • Objective: • Use machine learning to select a small number of “good” stocks to form a portfolio • Research questions: • Learning in the noisy dataset • Learning in the imbalanced dataset • Our solution: Prototype Ranking • A specially designed machine learning method
Outline • Introduction • The stock selection task • The Prototype Ranking method • Experimental results • Conclusions
Stock Selection Task Given information prior to week t, predict performance of stocks of week t • Training set Learning a ranking function to ranktesting data • Select n highest to buy, n lowest to short-sell
Outline • Introduction • The stock selection task • The Prototype Ranking method • Experimental results • Conclusions
Prototype Ranking • Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data • The PR System Step 1. Find good “prototypes” in training data Step 2. Use k-NN on prototypes to rank test data
Step 1: Finding Prototypes Prototypes: representative points • Goal: discover the underlyingdensity/clusters of the training samples by distributing prototypes in sample space • Reduce data size prototypes samples prototype neighborhood
Finding prototypes using competitive learning General competitive learning • Step 1: Randomly initialize a set of prototypes • Step 2: Search the nearest prototypes • Step 3: Adjust the prototypes • Step 4: Output the prototypes Hidden density in training is reflected in prototypes
Modifications for Stock data • In step 1: Initial prototypes organized in a tree-structure • Fast nearest prototype searching • In step 2: Searching prototypes in the predictor space • Better learning effect for the prediction tasks • In step 3: Adjusting prototypes in the goal attribute space • Better learning effect in the imbalanced stock data • In step 4, prune the prototype tree • Prune children prototypes if they are similar to the parent • Combine leaf prototypes to form the final prototypes
Step 2: Predicting Test Data • The weighted average of k nearest prototypes • Online update the model with new data
Outline • Introduction • The stock selection task • The Prototype Ranking method • Experimental results • Conclusions
Data CRSP daily stock database • 300 NYSE and AMEX stocks, largest market cap • From 1962 to 2004
Testing PR • Experiment 1: Larger portfolio, lower average return, lower risk – diversification • Experiment 2: is PR better than Cooper’s method?
Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)
Experiment 2: Comparison to Cooper’s method • Cooper’s method (CP): A traditional non-ML method for stock selection… • Compare PR and CP in 10-stock portfolios
Results of Experiment 2 Measures: • Average Return (Ret.) • Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.
Outline • Introduction • The stock selection task • The Prototype Ranking method • Experimental results • Conclusions
Conclusions • PR: modified competitive learning and k-NN for noisy and imbalanced stock data • PR does well in stock selection • Larger portfolio, lower return, lower risk • PR outperforms the non-ML method CP • Future work: use it to invest and make money!