1 / 46

Feature Engineering Studio

Feature Engineering Studio. September 23, 2013. Welcome to Mucking Around Day. Sort into pairs. Partner with the person next to you One group of 3 is allowed. Sort into pairs. Do we have a group of 3? One of the 3 will work with me. Sort into pairs. Go over your reports together

ros
Télécharger la présentation

Feature Engineering Studio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feature Engineering Studio September 23, 2013

  2. Welcome to Mucking Around Day

  3. Sort into pairs • Partner with the person next to you • One group of 3 is allowed

  4. Sort into pairs • Do we have a group of 3? • One of the 3 will work with me

  5. Sort into pairs • Go over your reports together • A maximum of 5 minutes apiece

  6. 5 minutes for first person

  7. 5 minutes for second person

  8. Re-assemble into one big group

  9. Who here found something really cool while mucking around? • Show us, tell us

  10. Who here found a histogram with a normal distribution? • Show us, tell us

  11. Who here found a histogram with a hypermode? • Show us, tell us

  12. Who here found a histogram with a flat distribution? • Show us, tell us

  13. Who here found a histogram with a skewed distribution? • Show us, tell us

  14. Who here found a histogram with a bimodal distribution? • Show us, tell us

  15. Who here found a histogram with something else interesting? • Show us, tell us

  16. Who here found something surprising with their min, max, average, stdev?

  17. Categorical variables • Who here found something curious, weird, or interesting in the distribution of their categorical variables?

  18. Who here hasn’t spoken yet?(and analyzed data) • Tell us something interesting you found in your data

  19. Who here played with pivot tables? • What did you learn?

  20. My turn to play with pivot tables • Who wants to volunteer their data? • (I might request a 2nd or 3rd data set, depending on how the 1st one goes)

  21. Who here played with vlookup? • What did you learn?

  22. My turn to play with vlookup • Using the same volunteered data set(s)

  23. Other cool things you can create with a few simple formulas (plus demos!)

  24. Identifying specific cases of interest

  25. Did event of interest ever occur for student?

  26. Counts-so-far(and total value for student)

  27. Counts-last-N-actions

  28. First attempts

  29. Ratios between events of interest

  30. How many students had 3 (or 4, 5, 2,…) of an event

  31. Times-so-far

  32. Cutoff-based features

  33. Unitized actions (such as unitized time)

  34. Last 3 or 5 unitized

  35. Comparing earlier behaviors to later behaviors through caching

  36. Counts-if

  37. Percentages of action type

  38. Percentages of time spent per action/location/KC/etc.

  39. Questions? Comments?

  40. Other cool ideas?

  41. Assignment 3 • Feature Engineering 1“Bring Me a Rock” • Get your data set • Open it in Excel • Create as many features as you feel inspired to create • Features should be created with the goal of predicting your ground truth variable • At least 12 separate features that are not just variations on a theme (e.g. “time for last 3 actions” and “time for last 4 actions” are variations on a theme; but “time for last 3 actions” and “total time between help requests and next action” are two separate features) • For each feature, write a 1-3 sentence “just so story” for why it might work • Test how good each features is

  42. Testing Feature Goodness • For this assignment, there are a bunch of ways to test feature goodness • Single-feature prediction models in data mining or stats package, giving correlation or kappa (special session this Wednesday) • Compute correlation in Excel (want to see?) • You can do this with binaries variables too, although it’s not really optimal • Compute t-test in Excel (want to see?) • Compute kappa in Excel (if you don’t know how, easier to do in RapidMiner)

  43. Were you right? • Which of your “just so stories” seem to be correct? • Did any of your feature correlate in the opposite direction from what you expected?

  44. Assignment 3 • Write a brief report for me • Email me an excel sheet with your features • You don’t need to prepare a presentation • But be ready to discuss your features in class

  45. Next Classes • 9/25 Special Session • Using RapidMiner to Produce Prediction Models • Come to this if you’ve never built a classifier or regressor in RapidMiner (or a similar tool) • Statistical significance tests using linear regression don’t count… • 9/30 Advanced Feature Distillation in Excel • Assignment 3 due • Online Equation Solver Tutorials should be in your INBOX

  46. Upcoming Classes • 10/2 Special session on prediction models • Come to this if you don’t know why student-level cross-validation is important, or if you don’t know what J48 is • 10/7 Advanced Feature Distillation in Google Refine • 10/9 Special session? TBD.

More Related