Outline

Machine Learning for Stock SelectionRobert J. YanCharles X. LingUniversity of Western Ontario, Canada{jyan, cling}@csd.uwo.ca

Outline • Introduction • The stock selection task • The Prototype Ranking method • Experimental results • Conclusions

Introduction • Objective: • Use machine learning to select a small number of “good” stocks to form a portfolio • Research questions: • Learning in the noisy dataset • Learning in the imbalanced dataset • Our solution: Prototype Ranking • A specially designed machine learning method

Stock Selection Task Given information prior to week t, predict performance of stocks of week t • Training set Learning a ranking function to ranktesting data • Select n highest to buy, n lowest to short-sell

Prototype Ranking • Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data • The PR System Step 1. Find good “prototypes” in training data Step 2. Use k-NN on prototypes to rank test data

Step 1: Finding Prototypes Prototypes: representative points • Goal: discover the underlyingdensity/clusters of the training samples by distributing prototypes in sample space • Reduce data size prototypes samples prototype neighborhood

Finding prototypes using competitive learning General competitive learning • Step 1: Randomly initialize a set of prototypes • Step 2: Search the nearest prototypes • Step 3: Adjust the prototypes • Step 4: Output the prototypes Hidden density in training is reflected in prototypes

Modifications for Stock data • In step 1: Initial prototypes organized in a tree-structure • Fast nearest prototype searching • In step 2: Searching prototypes in the predictor space • Better learning effect for the prediction tasks • In step 3: Adjusting prototypes in the goal attribute space • Better learning effect in the imbalanced stock data • In step 4, prune the prototype tree • Prune children prototypes if they are similar to the parent • Combine leaf prototypes to form the final prototypes

Step 2: Predicting Test Data • The weighted average of k nearest prototypes • Online update the model with new data

Data CRSP daily stock database • 300 NYSE and AMEX stocks, largest market cap • From 1962 to 2004

Testing PR • Experiment 1: Larger portfolio, lower average return, lower risk – diversification • Experiment 2: is PR better than Cooper’s method?

Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)

Experiment 2: Comparison to Cooper’s method • Cooper’s method (CP): A traditional non-ML method for stock selection… • Compare PR and CP in 10-stock portfolios

Results of Experiment 2 Measures: • Average Return (Ret.) • Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.

Conclusions • PR: modified competitive learning and k-NN for noisy and imbalanced stock data • PR does well in stock selection • Larger portfolio, lower return, lower risk • PR outperforms the non-ML method CP • Future work: use it to invest and make money!

Outline

Outline

Presentation Transcript

Outline

Outline

Outline

Outline

Outline

Outline

Outline

outline

outline

OUTLINE

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline

Outline:

Outline

Outline

OUTLINE: