The Moneyball Approach

The Moneyball Approach How best to make predictions

How to pick baseball players • “Expert” Opinion: collected wisdom of baseball insiders (including players, managers, coaches, scouts, and the front office) • Runs batted in, stolenbases, batting average not predictive of team success • Problem: is subjective and flawed • Sabermetrics: indicators that better predict success- on-base percentage; slugging percentage • A’s made it to playoffs in 2002, 2003

How will incoming freshmen do at year’s end ?

Trained Counselor Approach • 45 minute Student interview • High School Grades • Aptitude tests • 4 page personal statement

Statistical Algorithm • HS Grades • One Aptitude Test • Results: more accurate than 11/14 counselors • Similar results for other outcomes: violations of parole, criminal recidivism, pilot training, grad school success, career satisfaction, medical variables, success of new businese

Predictive Validity for GPA

Chap 21: Intuitions vs. Formulas • For “low-validity environments”, accuracy of “experts” is matched or exceeded by simple algorithm • Ashenfelter’s formula: • Avg temp over summer growing season • Amt rain at harvest • Total rain previous winter

Why are experts inferior to algorithms?

Experts are “too clever”, feel they can overrule formula because they have more information • Humans are inconsistent when evaluating same info twice • Influenced by transient factors: food breaks, mood, priming, etc

Formulas are better • Admission school interviews: • Likely to decrease accuracy of selection • Overconfidence in intuitions • Assign too much weight to personal impressions • Wine tasting: • Addition of subjective information lowers accuracy

Background • Employers, Admission Officers believe in unstructured interview, despite evidence to contrary • Academic rely on graduate interviews, rather than statistical data • Many colleges, universities use “holistic” measures

Why do people persist in illusion? • Belief that even if they don’t help, interviews can’t harm accuracy • Interviews can identify “broken leg” information ( exceptions that statistical rules miss)

Problems with Interviews • Dilution: Exposure to non-diagnostic information leads to ignoring good information. • Decreases accuracy • False confidence: people have need for sense-making, see patterns in random sequences • Confirmation bias • Example: candidate who showed up 25 minutes late

Current Study • Participants predicted GPAs of other students based on biographical info, including prior GPAs, and sometimes, unstructured interview • Interviewee provided random answers to questions • Results: • Dilution: GPA predictions more accurate without structured interview; less accurate than just using past GPA • Sensemaking: those who got random interview respnses just as likely to say they got good information

What if interview was conducted by someone else?

How do Participants rank methods? • Chose natural interview first • Chose no interview last, even though most accurate- preferred even over random interview • “Apparently, they believed that random interviews contained some useful information that all of the useless information would not drown out. Thus, while interviews do not help predict one’s GPA, and may be harmful, our participants believe that any interview is better than no interview, even in the presence of excellent biographical information like prior GPA.”

Complex algorithms are not necessary • Dominant practice: multiple regression • “optimal formula”- weighted combination of all predictors • Equal weighting schemes: Formulas that assign equal weight to all predictors are superior to multiple regression • Example: Apgar Score vs. clinical observation • 5 variables: HR; respiration; reflex; muscle tone; color • 3 scores: 0,1,2

Checklist Manifesto

Here, then, is the puzzle of I.C.U. care: you have a desperately sick patient, and in order to have a chance of saving him you have to make sure that a hundred and seventy-eight daily tasks are done right—despite some monitor’s alarm going off for God knows what reason, despite the patient in the next bed crashing, despite a nurse poking his head around the curtain to ask whether someone could help “get this lady’s chest open.” So how do you actually manage all this complexity?

Expertise is the mantra of modern medicine. • There are degrees of complexity, though, and intensive-care medicine has grown so far beyond ordinary complexity that avoiding daily mistakes is proving impossible even for our super-specialists. The I.C.U., with its spectacular successes and frequent failures, therefore poses a distinctive challenge: what do you do when expertise is not enough?

B-17 Checklist: when task is too complex for memory alone • Checklist for line infections: • (1) wash their hands with soap • (2) clean the patient’s skin with chlorhexidineantiseptic • (3) put sterile drapes over the entire patient • (4) wear a sterile mask, hat, gown, and gloves • (5) put a sterile dressing over the catheter site once the line is in. • Check, check, check, check, check.

Resistance from Doctors • Dramatic decreases in infections • Yet, doctors resisted • Offended • Doubts about generalization to real world • Keystone Initiative

The Right Stuff Mentality • We have the means to make some of the most complex and dangerous work we do—in surgery, emergency care, and I.C.U. medicine—more effective than we ever thought possible. But the prospect pushes against the traditional culture of medicine, with its central belief that in situations of high risk and complexity what you want is a kind of expert audacity—the right stuff, again. Checklists and standard operating procedures feel like exactly the opposite, and that’s what rankles many people. • USA vs. Spain/Austria??

Hostility to Algorithms • Illusion of Skill • Statistical evidence contradicts clinicians’ everyday experience, which includes • Confirmation bias • Short-term predictions • Unaware of limitations of knowledge or skill (dunning- kreuger effect)

Dunning-KreugerEffect

Moral debate • Statistical: mechanical, cut and dried, rigid, sterile, arbitrary, artificial • Clinical: dynamic, holistic, subtle, deep, rich, genuine, sensitive, sophisticated, real, living, natural, true to life, understanding

Resistance to demystification of expertise • Reaction of wine community • Reaction of doctors • Reaction of school admission committees

Kahneman Experience • Goal: assign recruit to best match in branch of army • Battery of tests • Interview: shown to be useless, focus on irrelevant questions • New procedure: standard formula for 6 traits, plus close your eyes and rate • Results: both did OK- Why???

How to hire • 1. Select traits that are important for success ( technical proficiency; reliability, etc) • Make list of questions for each trait and score, say from 1-5 • Hire with highest score, even if there is one you like better.

The problem with “gut” feelings • "I looked the man in the eye. I found him to be very straight forward and trustworthy and we had a very good dialogue."I was able to get a sense of his soul.

Q. How is Big Data being used more in the leadership and management field? • A. I think there’s been a fairly recent confluence of the ability to crunch lots of data at fairly low cost, venture capital investments that support new businesses in this field, and changes in what people expect. Leadership is a perennially difficult, immeasurable problem, so suddenly people are saying, “Maybe I can measure some piece of it.” Part of the challenge with leadership is that it’s very driven by gut instinct in most cases — and even worse, everyone thinks they’re really good at it. The reality is that very few people are.

Years ago, we did a study to determine whether anyone at Google is particularly good at hiring. We looked at tens of thousands of interviews, and everyone who had done the interviews and what they scored the candidate, and how that person ultimately performed in their job. We found zero relationship. It’s a complete random mess, except for one guy who was highly predictive because he only interviewed people for a very specialized area, where he happened to be the world’s leading expert.

Q: other insights from the studies you’ve already done? • A. On the hiring side, we found that brainteasers are a complete waste of time. How many golf balls can you fit into an airplane? How many gas stations in Manhattan? A complete waste of time. They don’t predict anything. They serve primarily to make the interviewer feel smart. Instead, what works well are structured behavioral interviews, where you have a consistent rubric for how you assess people, rather than having each interviewer just make stuff up. Behavioral interviewing also works — where you’re not giving someone a hypothetical, but you’re starting with a question like, “Give me an example of a time when you solved an analytically difficult problem.” The interesting thing about the behavioral interview is that when you ask somebody to speak to their own experience, and you drill into that, you get two kinds of information. One is you get to see how they actually interacted in a real-world situation, and the valuable “meta” information you get about the candidate is a sense of what they consider to be difficult. • “more of a checklist and actionable”

Q. Other insights from the data you’ve gathered about Google employees? • A. One of the things we’ve seen from all our data crunching is that G.P.A.’s are worthless as a criteria for hiring, and test scores are worthless — no correlation at all except for brand-new college grads, where there’s a slight correlation. Google famously used to ask everyone for a transcript and G.P.A.’s and test scores, but we don’t anymore, unless you’re just a few years out of school. We found that they don’t predict anything.

Q. Any crystal-ball thoughts about how Big Data will be used in the future? • A. When you start doing studies in these areas, Big Data — when applied to leadership — has tremendous potential to uncover the 10 universal things we should all be doing. But there are also things that are specifically true only about your organization, and the people you have and the unique situation you’re in at that point in time. I think this will be a constraint to how big the data can get because it will always require an element of human insight.

The Moneyball Approach