330 likes | 420 Vues
Learn about Excel Equation Solver, Linear Regression, and Google Refine functionalities for effective data manipulation and analysis. Discover iterative feature refinement techniques and upcoming class schedules.
E N D
Feature Engineering Studio October 7, 2013
But first… • Excel Equation Solver
What it requires • Parameters • Goodness metric (typically SSR)
Linear Regression Example • Look at prior variables • And how model prediction is created from predictor • Create SSR variable
Linear Regression Example • Hand-iterate on variables
Linear Regression Example • Excel equation solver
BKT Example • Go through functions
BKT Example • Excel equation solver
BKT Example • Excel equation solver • Constrain P(G) to under 0.3
BKT Example • Excel equation solver • Try different solver algorithms
GoogleRefine(now OpenRefine) • Functionality to make it easy to regroup and transform data • Find similar names • Connect names • Bin numerical data • Mathematical transforms showing resultant graphs • Text transforms and column creation
GoogleRefine(now OpenRefine) • Functionality for finding anomalies/outliers
GoogleRefine(now OpenRefine) • Functionality for automatically repeating the same process on a new data set • *Really* nice for cases where you complete a complex process and want to repeat it
GoogleRefine(now OpenRefine) • Functionality for connecting your data set to web services to get additional relevant info
GoogleRefine(now OpenRefine) • Can load in and export common but hard-to-work-with data types • JSON and XML
GoogleRefine(now OpenRefine) • Some videos you should watch later • http://www.youtube.com/watch?v=B70J_H_zAWM • http://www.youtube.com/watch?v=cO8NVCs_Ba0 • http://www.youtube.com/watch?v=5tsyz3ibYzk
In birthdate order • Each person should tell us about their favorite feature they created for Bring Me a Rock Day 2 • Tell us what it was • How you created it • Your just-so story • And was your just-so story correct
Next • Tell us about anything cool you did in Excel or another program to create a feature
Too Hard? • Were there any features that anyone kind of wanted to create, but it was too difficult? (or too much work?)
Better? • Who here got better features (in terms of goodness metric) for Bring Me a Rock Day 2, than Bring Me a Rock Day 1?
Assignment 5 • Iterative Feature Refinement • Select three of the features you have created in previous assignments • These features should be “among the best” of the features you have previously created • For each of these three features, create at least five “close variants” of these features • “time for last 3 actions” and “time for last 4 actions” are close variants • “time for last 3 actions” and “total time between help requests and next action” are two separate features • Using the Excel Equation Solver is a substitute for creating five “close variants” • If you don’t use the excel equation solver • As you create the close variants for each feature, don’t just make them all at once • Make a variant • Test whether it’s better than the previous variant (by goodness metric) • If it is, keep going in the same direction • If it isn’t, try doing the opposite or something else
Assignment 5 • Write a report that discusses your process • I took feature N • I changed it from N to N* • The goodness changed from G to G* • Then I did…
Assignment 5 • You don’t need to prepare a presentation • But be ready to discuss your features in class
Next Classes • 10/9 RapidMiner Practice Session • Bring your RapidMiner process to class with questions, on a laptop • We’ll learn together • 10/14 Iterative Feature Refinement • Assignment 5 due
Upcoming Classes • 10/16 No special session today • 10/21 Feature Adaptation • 10/23 Special Session on Building Prediction Models