Advanced Excel for Data Analysis
E N D
Presentation Transcript
Advanced Excel for Data Analysis Yale University StatLab
What will be covered • Following up on Intro to Excel Workshop • Importing and exporting data • Data management • cleaning and manipulation • Statistics and data analysis • Pivot tables • Graphing and charts • Writing custom Macros/VBA • More time for specific questions at the end
Quick Review of Some Basics • We’re using Excel 2007 • Excel 2010 looks different again (improvement) • Recent Excel improvements/changes • More rows (not 65536 anymore) • Note new file suffixes: .xlsx and .xlsm • Not backwards compatible -save as .xls to re-open in older versions-free download allows older versions to open • More levels of sorting and filtering • Better charting
When and Why Excel • It’s a spreadsheet • Great for calculations and formulas • Okay for Data Management • But that’s what most people use it for • Should probably be using Access • Everybody has Excel (including Macs) • No VBA or SharePoint access for Macs • That’s how we got the Data • Ubiquitous file format widely imported and exported • We’ve been using it our whole lives, right? • Most of us actually only scratch the surface.
Importing And Exporting • Important for Importing and Exporting: Define Variable Types (string, date, int) • Open • “All Files” (*.*) • Text to Column, Delimited (basic) • Let’s see an example http://statlab.stat.yale.edu/help/workshops/introData/dataintro • StatTransfer • Supports Excel from/to anything • Does batch file conversion • Embedding in Word/Powerpoint • Keeps data linked so if you change data you don’t have to re-import to document
Working with Data • Treat Data Like Data, not a bunch of cells • Defining Data Ranges • Name your Sheets • Name your data ranges • Conditional formatting and Data Validation • Quick way to determine whether your data looks good • Set ranges so that if a value falls outside the allowed range, the cell formatting changes • Create a unique ID row if it doesn’t exist • Allows you to restore order and merge
Filtering, Sorting and Finding Data • Most common question: I need to find unique values or a subset of my data. • Sorting and Filtering – make sure you highlight the entire dataset! • Adding a conditional dummy variable is easy • Remember to copy and paste as values • Lookups • VLOOKUP and HLOOKUP • Used with named ranges
Text Manipulation • If cell A1 = “abcde ” • TRIM(A7)=abcde • gets rid of extra spaces • RIGHT (A7,2)=de • and LEFT, MID • SUBSTITUTE (A7, “c”, “o”)=abode • Comparing Cells (especially in Macros): • If cell A1 = “12345” (a string, not a #) • VALUE(A1)=12345 • Useful for building syntax, repetitive text for other programs
Statistics and Data Analysis • Many basic summary functions • AVERAGE, STDEV, CORREL, etc. • CONFIDENCE for confidence intervals • Probability distributions • NORMDIST, NORMINV, etc. • Array functions • LINEST, FORECAST • Data Analysis Tools • Data tab -> Data Analysis • If not installed, upper left circle -> Add-Ins • Linear regression, Chi-square
Pivot Tables and Pivot Charts • Easy snapshot views of your data. • Pivot charts for Large Data • Great for transforming time series data into tables • Be aware: Default calculation is “COUNT” and you probably want “SUM” • Click-and-drag interface makes this easy to use, but requires manual refresh anytime your data changes
Graphing and Charts • Dependent on column/row as x,y • You may need to copy, paste special/transpose to get the chart type you want • Charts for publishing • Black and white usually preferred • Patterns not gradients • Dots Per Inch (DPI) • Charts should depend on what is being conveyed • Bar graph (measurements) • Scatter plots (correlation, multivariate) • No Pie Charts
More Advanced Charts • Trend lines • Changing the data to series • Removing/Hiding lines • Error Bars • Stock (High-Low-Close) • Excel wants defined variable names, but it can be tricked • Excellent tips for advanced graphing online • http://peltiertech.com/
Writing Your Own Macros • Be careful: You can’t UNDO running a macro. • Use the record function to get the idea and then customize for your needs. • Basic structure is Object Oriented • Object.property() • The built-in Editor will show options • Supports If, then, else, for loops , while loops • Be careful: You can’t UNDO running a macro.
More advanced • Financial Calculation functions • Present value, IRR, amortization • Database connections • What-if analysis • Forms (with Data Validation) • Password protection
Resources • Lots of Yale resources available • StatLab Consultants • Sherlock Campbell • Themba Flowers • Orbis.yale.edu • We have access to several online resources • Excel Hacks (O’Reilly) • YouTube. (Oh really?) • Internet Searches • Bing might actually be better for Office
Thank you Questions?