1 / 6

Data Mining - Baseball Data

Data Mining - Baseball Data. Clustering ( 群 聚 方 法 ) 學生 : 鄭嘉仁 . 蘇信嘉 指導老師 : 于昌永. 目的 :. 利用 K-mean , Pam , Hierarchical 三種群聚的方法 (Cluster) , 去分析 Baseball Data 中 , 球員的表現與薪水高低的關聯性 . 接著利用這三種方法的結果 , 去預測 Baseball Data 中某些球員的缺失薪水. 步驟 :. Standardize K-means Pam

poppy
Télécharger la présentation

Data Mining - Baseball Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining - Baseball Data Clustering ( 群 聚 方 法 ) 學生:鄭嘉仁.蘇信嘉 指導老師:于昌永

  2. 目的 : 利用K-mean , Pam , Hierarchical 三種群聚的方法(Cluster) , 去分析Baseball Data中 , 球員的表現與薪水高低的關聯性. 接著利用這三種方法的結果 , 去預測Baseball Data中某些球員的缺失薪水.

  3. 步驟 : • Standardize • K-means • Pam • Hierarchical

  4. 總結 : 因為k-mean和pam皆需選定k值,而我們可以利用Hierarchical的方法,選定最佳的k值,再去進行k-mean和pam的分析. 所以經過這三種方法的分析結果,我們選定了k=5為最佳k值.然後對k-mean跟pam作比較,結果如下:

  5. 用k-mean(k=5)下去預測缺失的薪水資料 球員編號 薪水預測 第一類 102.104.107.200.247. 50萬 第二類 16.37.43.58.67.72.78.84.95.106.139.151.158. 161.170.172.174.204.209.271.299.317 95萬 第三類 33.49.81.105.236.284.309. 15萬 第四類 19.23.31.39.42.45.53.65.70.98.115.126.145. 159.211.226.229.251.255.293.303 55萬 第五類 1.40.198.254 45萬

More Related