Even scientists jump to conclusions – and that’s a problem

September 5, 2018

Paul Biegler

Paul Biegler is a philosopher, physician and Adjunct Research Fellow in Bioethics at Monash University. He received the 2012 Australasian Association of Philosophy Media Prize and his book The Ethical Treatment of Depression (MIT Press 2011) won the Australian Museum Eureka Prize for Research in Ethics.

“Fully rational man is a mythical hero,” wrote the late German economist Reinhard Selten, not long after he snared the 1994 Nobel Prize in Economics.

Selten was referring to the idea that when people make decisions they often forgo complex calculations in favour of “fast and frugal” mental shortcuts, sometimes called heuristics. And that can be a good thing.

Novices who predict Wimbledon winners by following the rule “choose the player whose name you recognise”, for instance, do as well as experts who weigh up all the stats on players’ form, seeding and so on.

Relying on heuristics, however, can also get us into trouble.

If media reports of a tennis player’s grand slam win were drowned out by, for example, a terrorism incident, we might not recall his or her name. And fear can be notoriously misplaced – think fear of flying.

Scientists, of course, are supposed to be immune to such cognitive shenanigans, operating in a space of mental clarity that eschews the allure of “quick and dirty” shortcuts in favour of objective statistics.

Sadly, this is not the case for a surprising number of scientists, according to a new study published in the journal eNeuro and authored by Ray Dingledine, from the Department of Pharmacology at Emory University School of Medicine in Atlanta, US.

Dingledine decided to repeat some experiments that were run more than 40 years ago by legendary psychologists Amos Tversky and Daniel Kahneman. The experiments pose questions that ingeniously pit our tendency to draw instinctive conclusions against the cold steel of statistical reasoning.

Consider the following possible gender sequences of babies born at a hospital, where “B” stands for boy, and “G” for girl: BBBBGGGG; GGGGGGGG; BGBBGBGB. Do these sequences seem equally likely?

Eight girls in a row seems, at first glance, a bit of a stretch. Statistics, however, tell us that the probability of either gender is 0.5 and, therefore, that the probability of each sequence is 0.5 to the power of 8. Do the math and each birth sequence has the equally tiny probability of 0.004, that is, 0.4% or 4 in 1000.

The original participants were undergraduates who were relative statistical novices. Dingledine, instead, had a range of experienced researchers take the tests, including faculty members and postdocs.

The results were not encouraging.

Dingledine found more than half of respondents thought the sequences were not equally likely. Many backed that up with written comments that eight girls in a row was “extremely unlikely”.

Another experiment introduced “Chris”, who is described as “of high intelligence, although lacking in true creativity”. Chris also has “a need for order and clarity and for neat and tidy systems” and shows “little feel and little sympathy for other people”. Subjects were asked to rank what kind of job Chris was likely to be in.

Two-thirds thought Chris was more likely to work in library science than business administration. This is despite the fact that, in the US, business admin employs 64 times as many people, making it by far the more likely job. The stereotype “librarian” confounded statistics as a predictor of job status.

The results in these and two other scenarios, reports Dingledine, were “essentially the same” as those found in the undergrads studied by Tversky and Kahneman.{%recommended 7498%}

Dingledine is forthright in his conclusions.

“The findings reinforce the roles that two inherent intuitions play in scientific decision-making: our drive to create a coherent narrative from new data regardless of its quality or relevance, and our inclination to seek patterns in data whether they exist or not,” he says.

Dingledine also says the results speak to a bigger problem, something Kahneman famously described in an open letter to colleagues in 2012 as a “train wreck looming”: the widespread failure to replicate the findings of many important studies in the social sciences.

That wreck may well be upon us.

A recent article in the journal Nature Human Behaviour reported an attempt to replicate 21 social science experiments published in the journals Nature and Science between 2010 and 2015.

One study, for example, found that viewing images of Rodin’s sculpture The Thinker led people to think more analytically and discouraged a belief in God. It fell at the replication hurdle. In fact, the researchers only succeeded in 13 replications, and even then effect sizes were, on average, just half of those seen in the original experiments.

The crisis has led to a range of suggested remedies.

In an editorial accompanying the Nature article, Malcolm MacLeod, a professor of neurology at the University of Edinburgh in Scotland, called for reproducibility initiatives to target studies with less robust findings. Such findings might be expressed via bigger “p-values”, indicating weak evidence, or smaller effect sizes.

Writing in Nature in May, University of California statistics professor Philip Stark proposed that “preproducibility” be built into study design – researchers should spell out their precise “scientific recipe” to ensure replications are faithful. Science, writes Stark, “should be ‘show me’, not ‘trust me’”.

Dingledine, however, suggests the reproducibility crisis must expand its focus from experimental design to include the influence of “human nature” on scientific judgment, which he says has not received due attention.

His own prescription sounds morbid – he proposes a “premortem” where scientists convene before the experiment to suppose it fails and list possible reasons why. This, he says, could add a pre-emptive check that might help to avoid “common biases that unhelpfully support our preconceived notions”.

Dingledine is presumably referencing the confirmation bias, that ubiquitous tendency to favour evidence that supports one’s view. But he makes surprisingly little of the modern day pressures to publish that plausibly incite p-hacking (massaging data post hoc to get a significant result) or “HARKing” (altering the hypothesis after results are known).

The researcher also concedes his study does not show that the reported cognitive biases transfer to the scientists’ planning and evaluation of their own projects.

Nonetheless, the problem is being increasingly recognised. A “manifesto for reproducible science” published last year in Nature Human Behaviour called for a number of measures, including blinding researchers to experimental conditions when analysing results, and more pre-registration of trials to reduce publication bias from negative results being “shelved”.

Those authors conclude with a quote from American physicist Richard Feynman, which is surely salutary for any scientist embarking on a research project:

“The first principle is that you must not fool yourself – and you are the easiest person to fool.”

The Cosmos scientists’ gift guide

22 scientists honoured by the AAS

No return to closed working after COVID-19

If at first you don’t succeed, try, try again