During the investigation
into the scientific conduct of Dirk Smeesters, I expressed my incredulity
about some of his results to a priming expert. His response was: You don’t understand these experiments. You
just have to run them a number of times before they work. I am convinced he
was completely sincere.
What underlies this comment is what I’ll call the shy-animal mental model of
experimentation. The effect is there; you just need to create the right
circumstances to coax it out of its hiding place. But there is a more
appropriate model: the 20-sided-die model
(I admit, that’s pretty spherical for a die but bear with me).
A social-behavioral priming experiment is like rolling a
20-sided die, an icosahedron. If you roll the die a number of times, 20 will
turn up at some point. Bingo! You have a significant effect. In fact, given
what we now know about questionable
and not
so questionable research practices, it is fair to assume that the researchers are actually rolling with a 20-sided die where maybe as many as six sides
have a 20 on them. So the chances of rolling a 20 are quite high.
![]() |
I didn't know they existed
but a student who read this post
brought this specimen to class; she
uses it for Gatherer games. |
Now suppose that someone else tries to faithfully recreate
the circumstances that co-occurred with the rolling of the 20, from the
information that was provided by the original rollers. They recruit a 23-year
old male roller from Michigan, wait until the outside temperature is exactly 17
degrees Celsius, make the experimenter wear a green sweater, have him drink the
same IPA on the night before, and so on.
Then comes the big moment. He rolls the die. Unfortunately,
a different number comes up— a disappointing 11. Sadly, he did not replicate
the original roll. He tells this to the first roller, who replies: Yes you got a different number than we did but that’s because of all kinds of extraneous factors that we didn’t tell you about because we don’t know
what they are. So it doesn’t make sense for you to try replicate our roll
because we don’t know why we got the 20 in the first place! Nevertheless, our
20 stands and counts as an important scientific finding.
That is pretty much the tenor of some contributions in a
recent issue of Perspectives on
Psychological Science that downplay the replication crisis in
social-behavioral priming. This kind of reasoning seems to motivate recent
attempts by social-behavioral priming researchers to explain away an increasing
number of non-replications of their experiments.
Joe Cesario, for example, claims that replications
of social-behavioral priming experiments by other researchers are uninformative
because any failed replication could result from moderation, although a theory of the
moderators is lacking. Cesario argues that initially only the
originating lab should try to replicate its findings. Self-replication is in and
of itself a good idea (we have started doing it regularly in our own lab) but as
Dan Simons rightfully remarks in his contribution to the
special section: The idea that only the
originating lab can meaningfully replicate an effect limits the scope of our
findings to the point of being uninteresting and unfalsifiable.
![]() |
Show-off! You're still a "false positive." |
Ap Dijksterhuis also mounts a defense of priming
research, downplaying the number of non-replicated findings. He talks about the odd false positive, which sounds a
little like saying that a penguin colony contains the odd flightless bird (I know, I know, I'm exaggerating here). Dijksterhuis claims that it is not
surprising that social priming experiments yield larger effects than semantic
priming experiments because the manipulations are bolder.
But if this were true, wouldn’t we expect social priming effects to replicate
more often? After all, semantic priming effects do; they are weatherproof,
whereas the supposedly bold social-behavioral effects appear sensitive to such
things as weather conditions (which Dijksterhuis lists as a moderator).
Andrew Gelman made an excellent point
in response to my previous post that false
positive is actually not a very appropriate terminology. He suggests an alternative
phrasing: overestimating the effect size.
This seems a constructive perspective on social-behavioral priming without any
negative connotations. Earlier studies provide inflated estimations of the size
of social-behavioral priming effects.
A less defensive and more constructive response by priming
researchers might therefore be: “Yes, the critics have a point. Our earlier
studies may have indeed overestimated the effect sizes. Nevertheless, the notion of social-behavioral priming is
theoretically plausible, so we need to develop better experiments, pre-register
our experiments, and perform cross-lab replications to convince ourselves and
our critics of the viability of social-behavioral priming as a theoretical
construct.“
In his description of Cargo
Cult Science, Richard Feynman stresses need for researchers to be
self-critical: We've learned from
experience that the truth will come out. Other experimenters will repeat your
experiment and find out whether you were wrong or right. Nature's phenomena
will agree or they'll disagree with your theory. And, although you may gain
some temporary fame and excitement, you will not gain a good reputation as a
scientist if you haven't tried to be very careful in this kind of work. And
it's this type of integrity, this kind of care not to fool yourself, that
is missing to a large extent in much of the research in Cargo Cult Science.
It is in the interest of the next generation of priming researchers
(just to mention one important group) to be concerned about the many
nonreplications (coupled with the large effect sizes and small samples that are
characteristic of social-behavioral priming experiments). The lesson is that
the existing paradigms are not going to yield further insight and ought to be
abandoned. After all, they may have led to overestimated priming effects.
I’m reminded of the Smeesters case again. Smeesters had
published a paper in which he had performed variations on the professor-prime
effect, reporting large effects (the effects that prompted my incredulity).
This paper has now been retracted. One of his graduate students had performed
yet another variation on the professor-prime experiment; she found complete
noise. When we examined her raw data, the pattern was nothing like the pattern Uri
Simonsohn had uncovered
in Smeesters’ own data. When confronted with the discrepancy between the two
data sets, Smeesters gave the defense we see echoed in the social-behavioral
priming defense discussed here: that experiment was completely different from
my experiments (he did not specify how), so of course no effect was found.
There is reason to worry that defensive responses about
replication failures will harm the next generation of social-behavioral priming
researchers because these young researchers will be misled into placing much
more confidence in a research paradigm than is warranted. Along the way they will probably
waste a lot of valuable time, face lots of disappointments, and might even face
the temptation of questionable research practices. They deserve better.