90 likes | 362 Vues
Using a Feed-forward ANN to predict NBA Games. About my ANN. Trained incrementally using back propagation Currently it only uses sigmoid activation Outputs 1 if the visiting team wins, 0 if the home team wins. Why Basketball?. Very large sample space – each season contains over 1000 games
E N D
About my ANN Trained incrementally using back propagation Currently it only uses sigmoid activation Outputs 1 if the visiting team wins, 0 if the home team wins
Why Basketball? Very large sample space – each season contains over 1000 games There exist a relatively few number of stats(4) that can account for a large portion of the final score
Sample Box Score from Basketball-Reference.com (where I got the Data)
About my dataset Includes the html files for every web page box score from 1999 till 2013 organized by year. I grabbed the whole html file in case I wanted to go and extract different stats for my input From html files I created text files which contains every game played that season. The text file contains the games in the order that they were played. This is very useful when estimating the inputs after training Here is an example entry: WAS 24 15 23 22 84 87.9 0.4 10.8 33.3 0.133 95.5 CLE 31 19 24 20 94 87.9 0.5 18.4 46.2 0.19 106.9
What the stats mean - Pace: the average number of possessions a team uses per game - eFG% = Effective Field goal percentage. This accounts for the fact that 3 point field goals are worth 50% more than 2 point field goals. (FGM + (0.5 x 3PTM)) / FGA. - TOV% = 100 * TOV / (FGA + 0.44 * FTA + TOV). Turnover percentage is an -estimate of turnovers per 100 plays. - ORB% = offensive rebounding percentage -FT/FGA = Free throws / FGA - Ortg = offensive raiting - eFG%, TOV%, ORB%, FT/FGA are referred to as the Four Factors because they account for 96% of point differential.
Initial Findings Can very accurately CHOOSE the winner based on the previously mentioned 6 stats Does not require an incredibly complicate ANN. 3 layers, 12 input neurons, 3-8 hidden neurons and 1 output neuron was sufficient Sigmoid function worked well, but I want to try other ones
Where I am Now Because the Four Factors account for so much of the point differential, a neural network trained on this (plus Pace and Ortg) can correctly choose who won an NBA game based on seeing the end game box score 100% of the time with various network configurations. This is based on my 14 season dataset, which amount to 15k games It sounds good at first until you realize that it is not possible to exactly know what these parameters will be BEFORE the game occurs. This mean we must resort to estimating them, which will never be totally accurate initial work on estimating the input based on the average of every games stats for that teams previous games has been able to predict with anywhere from 60-67% accuracy
Finishing Up This Project Try to find a better, more accurate way to predict a games stats Use a collection of Networks and see how accurate they can be Try with different combinations of input features