1 / 28

Predicting the winner of C.Y. award

Predicting the winner of C.Y. award. 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖. Introduction. Baseball sport in Taiwan CPBL (Chinese Professional Baseball League) MLB (Major League Baseball) Baseball sport in USA Cy Young Award since 1956

gypsy
Télécharger la présentation

Predicting the winner of C.Y. award

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting the winner of C.Y. award 指導教授:黃三益博士 組員: 尹川 陳隆賢 陳偉聖

  2. Introduction • Baseball sport in Taiwan • CPBL (Chinese Professional Baseball League) • MLB (Major League Baseball) • Baseball sport in USA • Cy Young Award since 1956 • Baseball Writers Association of America • Weighted scores • Each league has one winner per year.

  3. Measurements • There are no definite rules be used to judge. • Nevertheless, many measurements could be used to judge whether a pitcher is good or not. • Wins • ERA • WHIP • G/F etc.

  4. Aim of the study • To analysis the historical statistics of pitchers. • Building a predictive model. • To predict the Cy Young Award winner of the year in the future.

  5. Data mining procedure • Ten data mining methodology steps

  6. Step 1:Translate the Problem • Directed data mining problem • Target variable: Cy Young Award • Classification • Decision tree • Purposes • Gambling game • Predictive activities

  7. Step 2:Select Appropriate Data • Just MLB statistics data (1871 ~ 2006) • Cy Young Award: 1956 ~ 2006 • total 21456 records • List of Cy Young Award winners • “Time” factor • 1999 as the dividing year. • Because of the emerging items. • Variables: to remove the items that are not representative of a pitcher.

  8. Step 3:Get to know the data • The materials that we used all come from MLB official site • These data have already been disclosed for a lot of years • The quality of data is very good • some attributes has value since 1999

  9. Step 4:Create a model set • We divide the data into training data and testing data • We do not create a balanced sample • The record of MLB is not the seasonal materials • we will pick the materials since 1999

  10. Step 5:Fix problems with the data • These data are taken from MLB official side • No missing values • single source

  11. Step 6:Transform data to bring information to the surface • There are no combinations of attributes • We delete some attributes • We add a attribute-Year • We add a attribute (CyYoungAward_Winner) for classification

  12. Step 7:Build Models • Tools Used • Weka Crash Problem • Blank Attributes • Build Model • Handling Blank Attributes

  13. Tools Used

  14. Weka Crash Problem • Raw data • 21456 data instances • 42 attributes • Weka crashed during model construction • Give Weka more memory

  15. Blank Attributes

  16. Build Model • MLB 1956~2006 • with blank attributes • ADTree • MLB 1956~2006 • without blank attributes • ADTree • MLB 1999~2006 • ADTree

  17. Handling Blank Attributes

  18. 1956~2006, with blank attributes, ADTree

  19. 1956~2006, with blank attributes, ADTree

  20. 1956~2006, without blank attributes, ADTree

  21. 1956~2006, without blank attributes, ADTree

  22. 1999~2006, ADTree

  23. 1999~2006, ADTree

  24. Step 8:Assess Models(1/2) • Not good enough for gambling

  25. Step 8:Assess Models(2/2) • Some attributes are more important

  26. Step 9:Deploy Models • To implement a computer program with the built model. • To predict the Cy Young Award winner more easily.

  27. Step 10:Assess Results • To compare the predictive and the final Cy Young Award winner directly. • Not “business” but “interest”. • Assessment from the judgment of the person.

  28. Conclusions • We have used the classification technology to set up the model of predicting • We find the accuracy of the built model is not high • Some factors that we are not to consider • It can not use in the place with essential benefits • Just for fun

More Related