1 / 48

A Predictive Model of Inquiry to Enrollment

A Predictive Model of Inquiry to Enrollment. Cullen F. Goenner, PhD Department of Economics University of North Dakota cullen.goenner@und.nodak.edu www.business.und.edu/goenner Kenton Pauls Director of Enrollment Services University of North Dakota kenton.pauls@mail.und.nodak.edu.

aoife
Télécharger la présentation

A Predictive Model of Inquiry to Enrollment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Predictive Model of Inquiry to Enrollment Cullen F. Goenner, PhD Department of Economics University of North Dakota cullen.goenner@und.nodak.edu www.business.und.edu/goenner Kenton Pauls Director of Enrollment Services University of North Dakota kenton.pauls@mail.und.nodak.edu

  2. Issues Facing Enrollment Managers • Finding new “markets” • Increasing Tuition • Declining population (ND) • Increasing competition • Need to attract a particular type of student • Diversity/Quality • Data driven analysis • Accountability

  3. Questions we will answer today • What is predictive modeling? • How does one build a predictive model? • How can predictive modeling be used by institutions of higher education to improve enrollment?

  4. What is Predictive Modeling? • Predictive modeling uses statistical/econometric methods to quantitatively predict the future behavior of individuals. • Steps include • Data collection on the subject of interest • Build the model based on data analysis • Predictions made out of sample • Model validation/testing

  5. College Choice 3 stage process - Hossler and Gallagher (1987) • Predisposition/aspiration for higher education Encouragement, coursework, and interest. • Search of potential schools Councilors, campus contacts, program availability • Selection SES, Ability, Fit, Geography

  6. Factors Influencing Choice Economic perspective: • Education an investment in human capital • Cost vs Benefit calculus Psychological perspective: • Need of self to find sense of belonging and fulfillment of needs. Sociological perspective: • Social interaction dictated by societal/family norms.

  7. Existing Empirical Work Search Choice • Applications: • DesJardin, Dundar, Hendel (1999) • Weiler (1994) • Interest: SAT scores sent • Toutkoushian (2001)

  8. Existing Models of Enrollment Choice • Model a student’s binary choice to enroll at a particular college while controlling for a student’s characteristics. • Logistic models used • Conditional on students have • Applied • Bruggink and Gambhir (1996) • Thomas, Dawes, and Reznik (2001) • Admitted • DesJardins (2002) • Leppel (1993)

  9. Our Predictive Model • Builds on the models of DesJardins (2002) and Thomas, Dawes, Reznik (2001) • Focus here is on prediction of enrollment of students that inquired of our institution. • “Inquiry model” is relevant because: • Time of information exchange, opinion formation • Allows for early intervention in a student’s decision making process (Target Marketing)

  10. Inquiry Model Challenges • Data collection • Data already collected on those who are admitted or apply. Typically not collected for inquiries. • Quality of data • Applicants provide detailed data describing themselves (demographic data test scores, HSGPA, etc.), which are not available for most student inquiries.

  11. Types of Inquiries We Recorded • Return of information card • Attendance of college fair • Campus visit • Contact via e-mail • Contact via phone • Referral from faculty, coach, or alumni • ACT automatically submitted

  12. How these data were captured • Enrollment Services Prospective Student Network relational database (ESPSN) • Customized system • SQL 2000/Visual Basic

  13. Information Collected From Information Request Card • Name • High School attended • Interested Major (if any) • Address Lacks the demographic data typical to application records and use in most predictive models.

  14. Geodemography • Process of attaching demographic characteristics to geographic characteristics. • Notion is that “Birds of a Feather Flock Together”, i.e. individuals living in the same neighborhood will tend to have similar behavior patterns. • Ex: Neighborhoods homogenous in terms of household income, occupations, family size, and purchases.

  15. Implementation • US Census data aggregated to zip code level • “Geodemographic” variables considered for our model specification: • College age demographic • Population • Average Income • White demographic • Median age

  16. Building the model • Binary choice model: Model whether students, who inquire of UND, either enroll or do not enroll. • 15,827 students made inquiries for Fall 2003 enrollment. Of these students 2067 actually enrolled. • Logistic regression model used.

  17. Candidate Control Variables • Type and Frequency of Contact • Geographic • Academic • Geodemographic • Interaction Effects

  18. Contact Variables

  19. Geographic Variables

  20. Academic/Geodemographic

  21. Interaction Terms

  22. Model Specification • Researchers typically assume their model specification is the true model which generates the data. • Difficult to justify a priori the choice of variables to include in model, given each by design is theoretically relevant. • With k candidate variables there are 2k different linear models one could consider.

  23. Consider the case in which several models {M1, … MK} are theoretically possible. • Basing inference on the results of a single model is risky. • Bayesian model averaging (BMA) allows us to account for this type of uncertainty.

  24. BMA The posterior distribution of the parameters given the data in the presence of uncertainty is the posterior distribution under each of the K models, with weights equal to the posterior model probabilities P(Mk/D) . (1)

  25. Posterior Model Probability is (2) Where P(D/Mk) is the likelihood and P(Mk) is the prior probability that model Mk is the true model, given one of the K models is the true model.

  26. Posterior Model Probability Assuming a non-informative prior, (P(M1) = … P(Mk) = 1/K) (3)

  27. The posterior mean and variance summarize the effects of the parameters on the dependent variable. Raftery (1995) reports (9) where (k) and Var(k) are MLE under model k, and the summation is over models that include .

  28. BMA Implementation • SPlus function bic.logit – performs BMA on logistic regression models. • 30 regressors implies summation in equation 1 over 1 billion models. • To manage summation we use Occam’s window.

  29. Occam’s Window Exclude models that predict the data sufficiently less than predictions of the best model. Predictions based on PMP of each model. Models in A’ are included

  30. Results • 26 Models supported by the data • Model with highest PMP receives 21% of total. • Variables that receive strong support for inclusion include: • Geographic: Distance, HY State, HY School, Competitor distance • Geodemog: College Age, Average Income • Contacts: Number, Campus visit, Referral

  31. Out of Sample Predictive Performance • Split the data into two equal parts: • First part used to build/estimate the model • Second part used to test the model’s predictions. • Outcome (enrollment) is binary, while our model generates a probability estimate.

  32. What is a successful prediction? • Greene (2001) - No “correct” choice for probability cutoff. Typical value is .5 • Tradeoff in cutoff choice: • Lower cutoff increases the accuracy of inquiries that are predicted to enroll and who actually enroll (sensitivity) at the expense of inquiries predicted to enroll and do not enroll (false positive rate)

  33. Predictive Performance: Classification

  34. Predictive performance • 89% of observations correctly classified • Specificity: 97% • Sensitivity: 36% • ROC curve describes relation between sensitivity and 1- specificity (false + rate) • Area under ROC curve = .87

  35. Another Predictive Performance Method

  36. 79% of enrolled found within 22% of entire population (scores >= 0.2) • Focused efforts without compromising enrollment numbers • Efficiency implications

  37. Practical Applications • Effective regional market segmentation • Targeted tele-counseling efforts • Special projects

  38. Regional Market Segmenting • Target Marketing and Segmentation • Prospect names purchased based on zip code. • Establish a predictive “score” for all zip codes in US based on census-level data

  39. What the data indicated (WA)

  40. Where enrolled students came from (WA)

  41. 83% of enrolled WA students fell within top scoring zips over three years • Direct Mail Names Purchases • Prior years very open search criteria • MN, CO, SD, MT • This year, much more restrictive to get deeper into broader markets • Only key zips • CO, WA, OR, AZ, IL, MN, etc.

  42. WA Search Names - 2003

  43. WA Search Names - 2004

  44. Targeted Tele-Counseling Efforts • Student calling program • Top 20% of all model scores identified • Fluid number excluding applicants • Prompt student to take action

  45. Special Projects • Limited funds but targeted initiatives • Focus on as many of top scoring students • Postcards, brochures, etc.

  46. Possible Future Research • Cluster analysis for better market segmentation • Study of marginal effects

  47. Thank You! Questions?

More Related