230 likes | 368 Vues
Basic Data Mining Techniques. Contents. Query Tools Statistical Techniques Visualization Techniques Case-Based Learning (K-Nearest Neighbor). Query Tools and Statistical Techniques. 客戶是電信公司最大的資產 客戶行為存在於交換機的通話記錄中 了解客戶行為成為電信公司的趨勢 案例 : 推銷電話線路 替那些目前線路已經飽和的公司提供更多的電話線路 ‚ 是持續會有的商機
E N D
Contents • Query Tools • Statistical Techniques • Visualization Techniques • Case-Based Learning (K-Nearest Neighbor)
Query Tools and Statistical Techniques • 客戶是電信公司最大的資產 • 客戶行為存在於交換機的通話記錄中 • 了解客戶行為成為電信公司的趨勢 • 案例: 推銷電話線路 • 替那些目前線路已經飽和的公司提供更多的電話線路‚ 是持續會有的商機 • 何時客戶會需要額外的連接線路?
推銷電話線路 交換機的通話記錄 持續時間 轉成總類 對時間排序 統計佔線數
Query Tools and Statistical Techniques Naive Predictions
Visualization Techniques (Scatter Diagram) Music Magazine
K-Nearest Neighbor • Records that are close to each other live in each other’s neighborhood • Customers of the same type (cluster) will show the same behavior • Do as your neighbors do • Not really a learning technique • Disadvantage: • Inefficiency • It is difficult to understand that the performance of k-nearest neighbor is better than naïve prediction r
Result of the K-Nearest Neighbor Process 67.1% 70.2% 55.3% 85.4% 91.9%
K-Nearest Neighbors for 0*3*6 • C1: 1 0 0 1 0 0 1 • M1: 0 1 1 1 0 0 1 • Distance = 3 or Similarity = 4 • C1: 1 0 0 1 0 0 1 • M2: 0 1 1 1 0 1 1 • Distance = 4 or Similarity = 3
K-Nearest Neighbors for 0*3*6 If Similarity_Threshold is 6 Then 7 Neighbors (M3, M13, M14, M16, M19, M20, M25) are selected. Similarity
Summarize these 7 Neighbors • Neighbor 1: • 111 134 388 262 261 266 268 012 260 184 238 091 104 142 038 • Neighbor 2: • 240 256 290 441 442 442 510 518 518 520 522 001 005 016 184 • Neighbor 3: • none • Neighbor 4: • 402 193 228 179 227 111 204 364 • Neighbor 5: • 280 • Neighbor 6: • 193 • Neighbor 7: • 186 189 193 214 239 179 227 263 240 Like Movies
Like Movies for 0*3*6 • Count = 03 Movie = 臥虎藏龍 (193) • Count = 02 Movie = 尖峰時刻 (184) • Count = 02 Movie = 蛇眼 (240) • Count = 02 Movie = 美麗人生 (442) • Count = 02 Movie = 厄夜叢林 (518) • Count = 02 Movie = 楚門的世界 (111) • Count = 02 Movie = 全民公敵 (179) • Count = 02 Movie = 神鬼傳奇 (227)
Data Mining Tool & Query Tool • Suppose a large database containing millions of records that describe customers’ purchases • Who bought which product on what date? • What is the average turnover in July? • What is an optimal segmentation of clients? • What are the most important trends in customer behavior? • If you know exactly what you are looking for, use query tool • If you know only vaguely what you are looking for, use data mining tool