1 / 63

Survey Design and Analysis

Survey Design and Analysis. Torben Schubert, December 12th, 2012, CIRCLE, Lund NORSI course on ‘Survey of Quantitative Research’. Outline. Survey Design Cluster analysis Latent factors Hypothesis testing using Community Innovation Survey data Limited dependent variables

kyne
Télécharger la présentation

Survey Design and Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey Design and Analysis Torben Schubert, December 12th, 2012, CIRCLE, Lund NORSI course on ‘Survey of Quantitative Research’

  2. Outline • Survey Design • Cluster analysis • Latent factors • Hypothesis testing using Community Innovation Survey data • Limited dependent variables • Applicationusing STATA

  3. Survey Design

  4. Introduction • Yesterday, youhavehad an introductioninto linear regressionanalysis • OLS isonethemost powerful toolstotesthypothesis • But hypothesistestingis not theonlytask in quantitative empiricalresearch • Sometimeswemight not evenhave a clearideaaboutstructures in thedata set. Wemay find itdifficulttodevelop sensible hypothesis.

  5. Introduction • Sometimesweencountermeasurementproblemsthatmakeitdifficulttodiscernwhatthetheoreticalmeaningof a variable or a set of variables actuallyis. • Whatcanwe do then?

  6. The ideal waytogoodresults • Goodempiricalresearchshouldfollowthefollowingsteps: • Build a theoryabout a certainphenomenon (e.g. by literature review, orbysqueezingyourbrain) • Delineateexpectationsaboutempiricalrelationships (oftencalledhypotheses) • Collectthedatathatisnecessarytomeasureyourrelationships • Use a sensible techniquetodeterminewhetheryourhypotheses hold.

  7. Problems • This ideal processisoftenobstructed: • Wemightgetaccessto a richdatasetthatwehave not self-compiled and whichwetherefore do not fully understand. • Wemighthave a complexmeasurementconstruct in mind, but weare not surewhetherour variables reallymeasure it.

  8. Somesuggestions • Ifyouareunsureabouttheinformationcontained in yourdataset, do not underestimatethe power ofdescriptivestatistics. • Meansbygroupsorcorrelationscangreatlyimproveyourunderstandingofthedata. • Take your time toinvestigate an unknowndataset.

  9. Cluster analysis

  10. Cluster analysis • Whatis a cluster? • Looselydefined: Data canbeconsideredclustered, if • observationsbelongingtothe same clusterisalike. • observationsbelongingtootherclustersdiffer.

  11. Cluster analysis • Cluster analysisassumesthatobservations (e.g. firms) belongto a givennumberof different clustersthatareinherently different fromeachother. • Technically, yousearchfor multivariate similaritybetweenobservationsgiving a set ofcharacteristics. • E.g. youcouldthinkfirmsdifferbyage, size, and innovativeness

  12. Cluster analysis • A clusteringmethodthensortsthosefirmstogetherinto a givennumberofclustersthataremostsimilartoeachother. • A multitudeoftechniquesexist, but mostofthecommononesareratherdescriptiveallowingmanyarbitraryoptionstotheresearcher: • Which variables toinclude? • Howmanyclusterstogofor? • Whichmethodtouse?

  13. Cluster Analysis • and not all dataareclustered…

  14. Cluster analysis • An example in STATA based on theautodata set • The commandstructureis clustersubcommandvarlist, options • Type thefollowing: sysuseauto clusterwardslinkage rep78 lengthpriceif !missing (rep78) & !missing(length) & !missing(price), measure(correlation) clusterdendrogram

  15. Cluster analysis • The dendrogramlookslikethis and tellsatwhichtolerancewestarttoclustertogetherobservations and subgroups • Numberofclusterarbitrary, but maybe 3 not a badchoice.

  16. Cluster analysis • Then type clustergeneratecutvar = groups(3) In order togenerate a grouping variable • Togeneratesummarystatisticsbygroups type bysortcutvar: sum rep78 lengthpriceif !missing(cutvar)

  17. Cluster analysis

  18. Cluster analysis • Cluster analysisis a nicetoolofdataminingusefulwhenyouhavenoideaofwhatisgoing on. • Arguably, I would not recommendusingit in a scientificpaper, becauseofitsexploratorycharacter. • Itmightassistyou in earlierstagesofresearch. • Note thattherearestatisticallymoreadvancedmethods in otherpackages such as R (header: model basedclustering)

  19. Latent factors

  20. Latent factors • Oftentheoryistermed in unmeasureableconcepts. • Happens often in managementresearch, sociology, psychology • Suppose, youhypothesizethatteacherqualityincreasesstudentperformance. • Howtomeasureteacherquality? • Mightconsidertoask a batteryofquestionsabout a set ofqualitydimension (Is he well prepared? Does he reacttostudents‘ questions?...)

  21. Factoranalysis • The firstquestionyouaskis, ifthereisreally a unidimensional thingcalledteacherquality. • Youcanusefactoranalysisforthis. • Factoranalysisdeterminesforanygiven set of variables underlying (latent) constructs. • Type in thefollowing: use http://www.ats.ucla.edu/stat/stata/output/ m255, clear factor item13-item24, ipf factor(3)

  22. Factoranalysis • General rule: useasmanyfactorsasthereare Eigenvalues greaterthanone. • In thiscase 1: goodnews!

  23. Cronbach‘s Alpha • AnothercommonlyusedmeasureisCronbach‘s Alpha beingdefinedastheaveragecorrelationbetween a given set of variables. • Thisshouldbe large (at least 0.65). • Type in alphaitem13-item24

  24. Hypothesis testing using Community Innovation Survey data

  25. Introduction • Community Innovation Survey: harmonizedsurveyofinnovationbehavior in the European Union+Norway • Movingcrosssectiondatawithmanyinformationaboutinnovationinputs, outputs, firm characteristics, markets,… • Wecananalysethisdatawiththetoolswebeenequippedwithyesterday: • T-testsaboutdifferences in means • OLS totestmorecomplicatedhypotheses • But many variables do not easilylendthemselvesto OLS becauseoftheirnature…

  26. Limited Dependent Variables

  27. Overview • Limited dependent variables (LDV) • Typesof LDV • Implicationsfor OLS • EstimationMethods • Maximum LikelihoodEstimation • The needfor marginal effects • Probit and Logit Models • Multinomial Models • Count data • Tobit Models

  28. IntroductoryReminder • What do weestimatebyregression? • Supposewehavetheregressionequation: • Wearetypicallyinterested in thecoefficients/parameters. • But whatistheirmeaning? • A commonlyheardsuggestion: • Measureshowtheexplained variable changeswhentheexplaining variables changebyoneunit…

  29. IntroductoryReminder • Thisisimprecise. But why? • Look attheformulaagain: • The errorobstructsthisdirectrelationshipbetweentheexplained variable, and thecoefficientsas well astheexplaining variables.

  30. IntroductoryReminder • Wesolvethatbyfocusing on expectations • The coefficientnowhasthefollowingmeaning: • A coefficientmeasureshowtheexpectedvalueoftheexplained variable changeswhentheexplaining variables changebyoneunit.

  31. SomeTheory

  32. LDV - Types • Basic definition: An LDV isanydependent (also: explained, left-hand-side) variable in a regressionthatcannottakeanyvalue on the real axis. • Examples • Indicator-variables: e.g. employed (y/n) • Count variables: # patents • Strictly positive variables: amountofconsumedalcohol per week • Multinomialresponse variables: preferedleasure time activities (bowling, reading, meetingfriends)

  33. LDV – Implicationsfor OLS • Supposeweintendedtoexplainemploymentstatusofpersons. • Convenientwayofcodingis 1: employed and 0: unemployed • Technicallywecouldrun a linear regressionofthefollowing form: yieldingestimates

  34. LDV – Implicationsfor OLS • But considertheestimateexpectationof • Sinceisfixed and therearenorestrictionsthepredictedvaluesway well lie outside thetheoreticalboundariesof 0 and 1. • Implicationofthelinearityof OLS.

  35. LDV – Implicationsfor OLS • Weimpose a linear model withnorestrictions on an expectedvaluethatshouldbeboundedbetween 0 and 1. • Need to find a non-linear model fortheexpectationvalue.

  36. LDV – Implicationsfor OLS • Supposeyouwanttoexplainincome, dataiscensoredat an upperthreshold (e.g. 100,000€ p.m. and above) • Whathappens, ifyouuse OLS droppingthehighestcategory (truncation) orreplacingthecensoredvaluewith 100,000 (censoring)?

  37. LDV – Implicationsfor OLS • Obviously, downwardbias in thiscase. • Inconsistentresultsfrom OLS.

  38. Estimationmethods: ML • OLS doesn‘twork in thesesituations. • Common practicetherefore: • Confirmthatexplained variable is not LDV (profits), orat least roughly not LDV (sizeof a person) • If variable is LDV in some sense, useothermethodsimplementingappropriate non-linear modelsfortheexpectationvalue. • Whatarethesemethods?

  39. Estimationmethods: ML • Gladly, the Maximum Likelihood Approach offers a flexible solutionto a large classof such problems (developedby Fisher in thebeginning 20th century) • Itfollowsseveralsteps: • Choose an appropriatestatistical model foryourdata. • Based on this model express thelikelihoodforobservingyour sample as a functionsoftheparameters • Maximizethislikelihoodovertheparameters. The solutiontothisproblemarethe ML estimates.

  40. Marginal effects and meaningofcoefficients • Whataboutsizeoftheeffects? • Wearealwaysinterested in howthedependent variable changeswhenoneoftheindepentchanges. • Unfortunately, becausetheexpectationvalueisnow non-linear, thecoefficientsare not identicaltothe marginal effectsanymore.

  41. Marginal effects and meaningofcoefficients • In the Probit Model forexamplewecanshowthatthe marginal effectis:

  42. Marginal effects and meaningofcoefficients • Implications: • In the Probit model thecoefficientdoes not coincidewiththe marginal effect • Nonetheless, itgivesthecorrectdirection. Thisholdsformany ML methods but not for all. • Allways, and I seriouslymeanallways, report marginal effectsinsteadofrawcoefficientswhenusing ML. (STATA can do thateasily.)

  43. Practice in STATA

  44. The Probit and theLogit Model • Whenever, weencounter an indicator variable (0/1) asdependentweshouldthinkof a correctprobability model • Examples: • Unemployed vs. Employed • Non-patenting company vs. patenting company • … • Severalusablemodels, but mostcommon: • Logit model and probit model • Practically, no large differencebetweenboth, whenwefocus on marginal effects

  45. The Probit and theLogit Model • Easy toinvokethem in STATA usingthe probit orlogitcommand probit depvarindepvars, options logitdepvarindepvars, options • Forexample, ifyouhave a patent indicatorpat, theinnovationexpendituresinnoexp and thesizeofthecompanyempl, thecommandlookslikethis: probit patinnoexpempl

  46. The Probit and theLogit Model • The marginal effectsarecomputedusingthecommanddirectly after a probit/logitregression: mfx, predict(p) • Observethatthiscommandalwaysreferstothe last regression.

  47. Multinomialmodels • Supposetherearemanybuying alternatives for a product (e.g. Android Smartphone, I-Phone, Windows Smartphone) and youwouldliketoknowhowcustomers‘ characteristicsimpact on therebuyingdecision • In thiscase, 4 categories: no SP Android SP IPhone Windows SP

  48. Multinomialmodels • Differsfrom probit/logitbecausethereismorethanonecategory. • Twowidelyusedmodels: • Multinomiallogit • Multinomial probit • Herethereis a difference: multinomial probit more flexible, but calculationcomputationallyusually not feasiblewithmorethanfour-fivecategories.

  49. Multinomialmodels • STATA commandsaremprobit and mlogit: mprobitdepvarindepvars, options mlogitdepvarindepvars, options • Forexampleyouhave a variable spgivingconsumerleveldata on SP choice, incbeingtheimcome, and agetheage, thecommandwouldbe mprobitspincage

  50. Multinomialmodels • Obs: coefficients and marginal effects do not evenhavethe same direction • You must calculate marginal effectsusing (wehavefourcategories, eachhasitsown marginal effects) mfx, predict(p outcome(1)) mfx, predict(p outcome(2)) mfx, predict(p outcome(3)) mfx, predict(p outcome(4)) • Note: Ifdataisordered (e.g. Likertscale) youcanuseOrdered probit (oprobitwiththe same syntax)

More Related