Friday 29 August 2014

Replication and reputation: Whose career matters?



©CartoonStock.com
Some people are really uncomfortable with the idea that psychology studies should be replicated. The most striking example is Jason Mitchell, Professor at Harvard University, who famously remarked in an essay that "unsuccessful experiments have no meaningful scientific value".

Hard on his heels now comes UCLA's Matthew Lieberman, who has published a piece in Edge on the replication crisis. Lieberman is careful to point out that he thinks we need replication. Indeed, he thinks no initial study should be taken on face value - it is, according to him, just a scientific anecdote, and we'll always need more data. He emphasises:"Anyone who says that replication isn't absolutely essential to the success of science is pretty crazy on that issue, as far as I'm concerned."

It seems that what he doesn't like, though, is how people are reporting their replication attempts, especially when they fail to confirm the initial finding. "There's a lot of stuff going on", he complains "where there's now people making their careers out of trying to take down other people's careers".  He goes on to say that replications aren't unbiased, and that people often go into them trying to shoot down the original findings and this can lead to bad science:
"Making a public process of replication, and a group deciding who replicates what they replicate, only replicating the most counterintuitive findings, only replicating things that tend to be cheap and easy to replicate, tends to put a target on certain people's heads and not others. I don't think that's very good science that we, as a group, should sanction."
It's perhaps not surprising that a social neuroscientist should be interested in the social consequences of replication, but I would take issue with Lieberman's analysis. His depiction of the power of the non-replicators seems misguided. You do a replication to move up in your career? Seriously? Has Lieberman ever come across anyone who was offered a job because they failed to replicate someone else? Has he ever tried to publish a replication in a high-impact outlet? Give it a try and you'll soon be told it is not novel enough. Many of the most famous journals are notorious for turning down failures to replicate studies that they themselves published.  Lieberman is correct in noting that failures to replicate can get a lot of attention on Twitter, but a strong Twitter following is not going to recommend you to a hiring committee (and, btw, that Kardashian index paper was a parody).

Lieberman makes much of the career penalty for those whose work is not replicated. But anyone who has been following the literature on replication will be aware of just how common non-replication is (see e.g. Ioannidis, 2005). There are various possible reasons for this, and nobody with any sense would count it against someone if they do a well-conducted and adequately powered study that does not replicate. What does count against them is if they start putting forward implausible reasons why the replication must be wrong and they must be right. If they can show the replicators did a bad job, their reputation can only be enhanced. But they'll be in a weak position if their original study was not methodologically strong and should not have been submitted for publication without further evidence to support it.  In other words, reputation and career prospects will, at the end of the day, come down to the scientific rigour of a person's research, not on whether a particular result did or did not cross a threshold of p < .05.

The problem with failures to replicate is that they can arise for at least four reasons, and it can be hard to know which applies in an individual case. One reason, emphasized by Lieberman,  is that the replicator may be incompetent or biased.  But a positive feature of the group replication efforts that Lieberman so dislikes is that the methods and data are entirely open, allowing anyone who wants to evaluate them – see for instance this example. Others have challenged replication failures on the grounds that there are crucial aspects of the methodology that only the original experimenter knows about. To those I recommend making all aspects of methods explicit.

A second possibility is that a scientist does a well-designed study whose results don't replicate because all results are influenced by randomness – this could mean that an original effect was a false positive, or the replication was a false negative. The truth of the matter will only be settled by more, rather than less replication, but there's research showing that the odds are that an initial large effect will be smaller on replication, and may disappear altogether - the so-called Winner's Curse (Button et al, 2012).

The third reason why someone's work doesn't replicate is if they are a charlatan or fraudster, who has learned that they can have a very successful career by telling lies. We all hope they are very rare and we all agree they should be stopped. Nobody would make the assumption that someone must be in this category just because a study fails to replicate.

The fourth reason for lack of replication arises when researchers are badly trained and simply don't understand about probability theory, and so engage in various questionable research practices to tweak their data to arrive at something 'significant'. Although they are innocent of bad intentions, they stifle scientific progress by cluttering the field with nonreplicable results. Unfortunately, such practices are common and often not recognised as a problem, though there is growing awareness of the need to tackle them.

There are repeated references in Lieberman's article to people's careers: not just the people who do the replications ("trying to create a career out of a failure to replicate someone") but also the careers of those who aren't replicated ("When I got into the field it didn't seem like there were any career-threatening giant debates going on"). There is, however, another group whose careers we should consider: graduate students and postdocs who may try to build on published work only to find that the original results don't stand up. Publication of non-replicable findings leads to enormous waste in science and demoralization of the next generation. One reason why I take reproducibility initiatives seriously is because I've seen too many young people demoralized after finding that the exciting effect they want to investigate is actually an illusion.

While I can sympathize with Lieberman's plea for a more friendly and cooperative tone to the debate, at the end of the day, replication is now on the agenda and it is inevitable that there will be increasing numbers of cases of replication failure.

So suppose I conduct a methodologically sound study that fails to replicate a colleague's work. Should I hide my study away for fear of rocking the boat or damaging someone's career? Have a quiet word with the author of the original piece? Rather than holding back for fear of giving offence it is vital that we make our data and methods public: For a great example of how to do this in a rigorous yet civilized fashion I recommend this blogpost by Betsy Levy Paluck.

In short, we need to develop a more mature understanding that the move towards more replication is not about making or breaking careers: it is about providing an opportunity to move science forward, improve our methodology and establish which results are reliable (Ioannidis, 2012). And this can only help the careers of those who come behind us.


References  
Button, K., Ioannidis, J., Mokrysz, C., Nosek, B., Flint, J., Robinson, E., & Munafó, M. (2013). Power failure: why small sample size undermines the reliability of neuroscience Nature Reviews Neuroscience, 14 (6), 365-376 DOI: 10.1038/nrn3475

Ioannidis, J. (2005). Contradicted and Initially Stronger Effects in Highly Cited Clinical Research JAMA, 294 (2) DOI: 10.1001/jama.294.2.218

Ioannidis, J. (2012). Why Science Is Not Necessarily Self-Correcting Perspectives on Psychological Science, 7 (6), 645-654 DOI: 10.1177/1745691612464056

10 comments:

  1. It may be a small percentage of cases, but I don't see how this wouldn't negatively affect early career people. Getting a 'non-replication' worked out takes time. Competition for jobs is fierce enough that even if you're a great candidate a great CV, there are going to be others with just as great a CV who might not have lingering replication questions. Why wouldn't that person be at a disadvantage? Given how much we know bias about personal demographics can creep in a tip the scales a little, why wouldn't "I heard his stuff doesn't replicate" possibly do so as well?

    ReplyDelete
    Replies
    1. The real damage to early career people comes from embarking enthusiastically on a project that builds on prior work, only to find, after spending a few years working on it, that it doesn't replicate.
      If you are worried that your reputation may suffer if your study doesn't replicate, then hold back and don't publish until you are sure it does.

      Delete
    2. Hear, hear! For every indignant tenured professor out there worried about their reputation, there are so many more graduate students whose academic careers have "died in the womb" because they spent their graduate careers trying to build on supposedly established findings that just won't replicate, and in most cases this kind of research is considered pretty much unpublishable. If that is changing, it's hard for me to see how that is not a good thing.

      Delete
    3. The assumption seems to be that a non-replication wouldn't happen unless the research did something wrong?

      And if they didn't there would be plenty of time for the science to shake out properly?

      Delete
    4. I recently tried to publish a non-replication (that had intended to be a replication and extension, which obviously didn't work out). One of the reviewers informed me that despite not being able to point out any particular methodological flaw, there must be one as otherwise I'd have successfully replicated the original finding, and they rejected my paper on that basis....

      Delete
  2. Certainly when a researcher has an individual paper that does not replicate, it would be absurd to infer that they are a charlatan or fraud. But what about the case when someone has, say, five papers that don't replicate despite strenuous attempts?

    It's all well and good for us all to emphasize that replication should not be viewed as intrinsically threatening or negativistic, but do we have to deny the evident reality that in some fields and some labs, highly questionable research practices have been employed so intensely that they have resulted in completely bogus literatures?

    ReplyDelete
  3. I can only hope that where this is the case, the move to replication - and/or preregistration, which I see as the other approach to ensuring replicability - will flush it out.

    ReplyDelete
  4. I use the word replicate as a noun or verb to imply a repeat of a treatment within an experiment - e.g. "all treatments were replicated three times" or "all treatments had three replicates". Your use of the word is confusing me. You seem, on occasions, to be using the word replicate as a verb or a noun to describe the repeat of an experiment, not the repeat of a treatment within an experiment. But then you use phrases such as "failure to replicate" or "results don't replicate" or someone's work doesn't replicate", which implies something about the results or output of a (repeated) experiment not the act of repeating the experiment. So, using your dual meaning we could say that "they replicated the experiment, which failed to replicate" ........., which sounds a bit daft to me. Then you use the word replication as a noun to imply the process of repeating a study and the compound noun non-replication as a noun to indicate the result or outcome of a repeated study - i.e that it doesn't give the same result as the original or earlier study". It's all very confusing.
    Within statistics a replicate (plus the process of replication and the verb to replicate) has a very strict definition. It refers exclusively to what one does within a study or experiment and it never refers to the result or outcome of an experiment

    ReplyDelete
  5. I entirely agree about the potential damage to the unlucky PhD student who bases their project on an effect that fails to replicate. Although in such cases great studies could result by showing why and when it doesn’t replicate. See, for example, http://journal.frontiersin.org/Journal/10.3389/fpsyg.2013.00272/abstract

    However, I worry about the idea that an explicit methods section can always overcome the problem of the incompetent replicator. I would not expect to be able to cook like Gordon Ramsey by slavishly following one of his recipes. (Although it might work with Nigel Slater.)

    ReplyDelete
  6. I think you didn't adequately cover a fifth, and very important reason for non-replication:(and I think it goes beyond mere incompetence or bias): Some types of research require much more methodological know-how than others to get them right. So the first placae I would look for in case I failed to replicate someone else's findings is the methods. Given the ever-increasing amount of complexity of methods and the many things that can go wrong from original research idea to final analyzed data, that's what I'd worry about the most. Of course, this is also a double-edged sword: If the original study used very demanding and error-prone methods, the reported finding may simply be an artifact of botched methods, and that may be the actual reason that replication attempts fail. Bottom line: honest error may be both more likely and harder to detect the more complicated the methods become. And while there's a growing number of studies on how frequently people use questionable research practices or have manipulated data or know someone who did, it appears to me that research on the incidence of undetected error in scientific research and reporting is sorely lacking.

    ReplyDelete