Friday 26 July 2013

Why we need pre-registration


There has been a chorus of disapproval this week at the suggestion that researchers should 'pre-register' their studies with journals and spell out in advance the methods and analyses that they plan to do. Those who wish to follow the debate should look at this critique by Sophie Scott, with associated comments, and the responses to it collated here by Pete Etchells. They should also read the explanation of the pre-registration proposals and FAQ  by Chris Chambers - something that many participants in the debate appear not to have done.

Quite simply, pre-registration is designed to tackle two problems in scientific publishing:
  • Bias against publication of null results
  • A failure to distinguish hypothesis-generating (exploratory) from hypothesis-testing analyses
Either of these alone is bad for science: the combined effect of both of them is catastrophic, and has led to a situation where research is failing to do its job in terms of providing credible answers to scientific questions.

Null results

Let's start with the bias against null results. Much has been written about this, including by me. But the heavy guns in the argument have been wielded by Ben Goldacre, who has pointed out that, in the clinical trials field, if we only see the positive findings, then we get a completely distorted view of what works, and as a result, people may die. In my field of psychology, the stakes are not normally as high, but the fact remains that there can be massive distortion in our perception of evidence.

Pre-registration would fix this by guaranteeing publication of a paper regardless of how the results turn out. In fact, there is another, less bureaucratic, way the null result problem could be fixed, and that would be by having reviewers decide on a paper's publishability solely on the basis of the introduction and methods. But that would not fix the second problem.

Blurring the boundaries between exploratory and hypothesis-testing analyses

A big problem is that nearly all data analysis is presented as if it is hypothesis-testing when in fact much of it is exploratory.

In an exploratory analysis, you take a dataset and look at it flexibly to see what's there. Like many scientists, I love exploratory analyses, because you don't know what you will find, and it can be important and exciting. I suspect it is also something that you get better at as you get more experienced, and more able to see the possibilities in the numbers. But my love of exploratory analyses is coupled with a nervousness. With an exploratory analysis, whatever you find, you can never be sure it wasn't just a chance result. Perhaps I was lucky in having this brought home to me early in my career, when I had an alphabetically ordered list of stroke patients I was planning to study, and I happened to notice that those with names in the first half of the alphabet  had left hemisphere lesions and those with names in the second half had right hemisphere lesions. I even did a chi square test and found it was highly significant. Clearly this was nonsense, and just one of those spurious things that can turn up by chance.

These days it is easy to see how often meaningless 'significant' results occur by running analyses on simulated data - see this blogpost for instance. In my view, all statistics classes should include such exercises.

So you've done your exploratory analysis, got an exciting finding, but are nervous as to whether it is real. What do you do? The answer is you need a confirmatory study. In the field of genetics, failure to realise this led to several years of stasis, cogently described by Flint et al (2010). Genetics really highlights the problem, because of the huge numbers of possible analyses that can be conducted. What was quickly learned was that most exciting effects don't replicate. The bar has accordingly been set much higher, and most genetics journals won't consider publishing a genetic association unless replication has been demonstrated (Munafo & Flint, 2011). This is tough, but it has meant that we can now place confidence in genetics results. (It also has had a positive side-effect of encouraging more collaboration between research groups). Unfortunately, those outside the field of genetics are unaware of these developments, and we are seeing increasing numbers of genetic association studies being published in the neuroscience literature, with tiny samples and no replication.

The important point to grasp is that the meaning of a p-value is completely different if it emerges when testing an a priori prediction, compared with when it is found in the course of conducting numerous analyses of a dataset. Here, for instance, are outputs from 15 runs of a 4-way Anova on random data, as described here:
Each row shows p-value for outputs (main effects then interactions) for one run of 4-way Anova on new set of random data. For a slightly more legible version see here

If I approached a dataset specifically testing the hypothesis that there would be an interaction between group and task, then the chance of a p-value of .05 or less would be 1 in 20  (as can be confirmed by repeating the simulation thousands of times - in a small number of runs it's less easy to see). But if I just looked for significant findings, it's not hard to find something on most of these runs. An exploratory analysis is not without value, but its value is in generating hypotheses that can then be tested in an a priori design.

So replication is needed to deal with the uncertainties around exploratory analysis. How does pre-registration fit in the picture? Quite simply, it makes explicit the distinction between hypothesis-generating (exploratory) and hypothesis-testing research, which is currently completely blurred. As in the example above, if you tell me in advance what hypothesis you are testing, then I can place confidence in the uncorrected statistical probabilities associated with the predicted effects.  If you haven't predicted anything in advance, then I can't.

This doesn't mean that the results from exploratory analyses are necessarily uninteresting, untrue, or unpublishable, but it does mean we should interpret them as what they are: hypothesis-generating rather than hypothesis-testing.

I'm not surprised at the outcry against pre-registration. This is mega. It would require most of us to change our behaviour radically. It would turn on its head the criteria used to evaluate findings: well-conducted replication studies, currently often unpublishable,  would be seen as important, regardless of their results. On the other hand, it would no longer be possible to report exploratory analyses as if they are hypothesis-testing. In my view, unless we do this we will continue to waste time and precious research funding chasing illusory truths.

References

Flint, J., Greenspan, R. J., & Kendler, K. S. (2010). How Genes Influence Behavior: Oxford University press.

Munafo, M, & Flint, J. (2011). Dissecting the genetic architecture of human personality Trends in Cognitive Sciences, 15 (9), 395-400 DOI: 10.1016/j.tics.2011.07.007

38 comments:

  1. "In my view, unless we do this we will continue to waste time and precious research funding chasing illusory truths."

    Hear, hear !!

    ReplyDelete
  2. A problem that has to be addressed, however, is that of analysis and review. In the context of drug regulation a sponsor has to have the statistical analysis plan finalised before unblinding of the data. Whatever other analyses are subsequently presented this pre-specified one will be provided to the regulator who will pay almost all attention to this and almost no attention to anything else. On the basis of a series of such pre-specified analyses of a set of trials the claim is either accepted or not. Of course, if the regulator decides an analysis was silly, despite being pre-specified, the regulator may also reject the claim. Very, very rarely would the regulator decide that a claim that would fail on the basis of a pre-specified stupid analysis but would succeed on the basis of a sensible revised one should be accepted.

    However, if we look at the journal review process we see that there is a problem. We are currently being told that all trials should be published. Now we also want pre-registration. So what does this imply about acceptance of a paper for publication? Consider the following cases.
    1. Sensible pre-specification sensibly reported.
    2. Sensible pre-specification deviation in reporting.
    3. Stupid pre-specification (for example technically incorrect statistical procedure) faithfully reported.
    4. Stupid pre-specification corrected in manuscript.

    Now what is the purpose of peer-review and when is it supposed to happen? Presumably in the world of pre-registration, peer-reviewers should approve category 1 papers and require that category 2 papers be amended to become category 1 papers. But what about category 3 papers? Is peer-review supposed to change them into category 4? Or is per-review supposed to ensure (analogously to changing category 2 into 1) that 4 should be turned into 3?

    The problem is that the regulatory process of claim accepted versus claim rejected in which pre-sepcification plays an important role is a different one from manuscript accepted versus manuscript rejected.

    The former is one of deciding whether a claim should be accepted as proven or not. The latter is one of deciding whether an argument is sound or not. A sound argument may, of course, support the conclusion that a treatment is not effective.

    I am not against pre-registration but I think there are many details to work out and one of the implications may be that the work of statistical (and other) reviewers will have to increase. There will have to be detailed pre-experimental initiation review of the protocol and subsequent manuscript review to check that the protocol has been adhered to.

    ReplyDelete
    Replies
    1. Thanks for our comment:
      My understanding is that with pre-registration full review of the methods will take place before it is done, but the rigour of that review will be as great as is currently done for post-registration. So categories 3 and 4 should not get through the prereg process.

      Delete
  3. The comics version of this article...
    http://xkcd.com/882/

    ReplyDelete
  4. The opponents of pre-registration often appear to be overlooking the fact that well defined and well conducted replication/confirmatory experiments are required if science is to be self-correcting. The current scientific culture (journals, funding sources, education) focuses on and rewards exploratory studies, and inadequately addresses the fact that science must not stop there if it is to be an effective method to obtain truth.

    Pre-registration of studies is an important and fundamental part of doing confirmatory experiments. Other sciences that are considering study registration can learn from medical research, which has the most experience with study registration. Drug research is clearly divided into exploratory (phase 2) and confirmatory (phase 3) studies. A major purpose of registration is to specify which analyses are confirmatory and which are exploratory. This is simply good experimental methodology and does not stifle scientific creativity or exploration. The regulatory process for drugs recognizes that good confirmatory studies are essential for convincing scientific evidence and requires such studies, including pre-registration.

    However, linking study registration with a requirement for power analysis for exploratory studies, publication in a specific journal, and public distribution of data adds controversial baggage to the process. This baggage goes beyond what is done in medical research and will make acceptance and use of study registration difficult. Processes for study registration are needed that provide the benefits of basic registration of confirmatory experiments, and allow the other more controversial issues to be handled separately.

    Jim Kennedy

    ReplyDelete
    Replies
    1. Thanks for your comment. I agree that the approach adopted by Cortex is not the only option and I am optimistic that a range of preregistration options may become available. However, preregistration linked to a journal is especially useful for dealing with problem 1, publication of null results, because the journal does undertake to publish the study provided it meets their standards. And if you are going to adopt that approach you do need rigorous standards to avoid cluttering the literature with results that are null because underpowered.
      See also comments below: there are also alternatives to pre-registration: the most important thing is that we get people to recognise there's a problem with carrying on as we are now.

      Delete
  5. It is not clear to me that the distinction between "exploratory" and "confirmatory" investigations helps. If you explore 100 possible effects and find about 5 of them getting below p=0.05, you didn't find anything worth trying to confirm. There is no reason to think that any of those results are something worth confirming if they wouldn't pass muster as a confirmatory experiment.

    The only thing that is different in an exploratory investigation is that the experimenter isn't able to say in advance what is going to be tested, so you can't do anything so precise as a Bonferroni correction. There is more wiggle room, in other words. This makes it both easier to fudge, and harder to take seriously.

    ReplyDelete
    Replies
    1. Confirmatory research in the form of replication by other researchers is also important for catching unintentional or intentional errors in research.

      In addition, another important difference between exploratory and confirmatory research is that confirmatory research typically has larger samples sizes based on power analysis of previous research. Exploratory research typically has small samples sizes as well as unspecified multiple analyses. With small samples sizes, nonsignificant results are ambiguous—they could be due to the experimental hypothesis being false or due to a lack of power. With a study designed with appropriate power, nonsignificant results are evidence that the experimental hypothesis is false. The findings from exploratory research may be worth investigating further in confirmatory research even if an analysis is not statistically significant (the effects are being confirmed, not the p values). Rejecting a hypothesis because a small exploratory study did not produce significant results is not a fair test of the hypothesis.

      Study registration and power analysis go hand-in-hand for confirmatory research. Large exploratory studies could be done, but that is rare and inefficient in most areas of research, particularly experimental research.

      In general, good ideas come from exploratory research. Good evidence comes from confirmatory research. A healthy balance of both is needed for good science. My observation is that current scientific practices tend to have an unhealthy emphasis on exploration.

      Jim Kennedy

      Delete
    2. "In general, good ideas come from exploratory research. Good evidence comes from confirmatory research."

      http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0020124

      Here is what I don't get. I gather it would be important for scientists to read about "evidence" and not about "ideas".

      If this makes any sense, why not have scientists write in a diary, and have them share these "ideas" with whomever wants to read them. And you could enhance standards for publication (e.g. pre-registered, higly-powered, etc.), so that what is being read, actually could resemble "evidence".


      Delete
    3. This is not an argument that only confirmatory research should be published. Exploratory research can be published. But it will be recognized as not providing the higher degree of evidence of confirmatory research. Both exploration and confirmation are needed and both can be published.

      Delete
    4. Maybe this is a nice thought-experiment:

      Why is publishing this "exploratory" research (with the current standards for it) scientifically useful ?

      (a possible answer could be...)





      Okay, so why then not decrease standards? Why not make p < .24 be seen as "significant" and why not have even fewer participants in your research, etc.?

      That way you could do a whole lot more of "exploratory" research, and even publish more as well, so scientists can all read this (which of course they all do, it's not like there is a ton of research being published) and use it for their own research (it's not like that ever goes wrong, or that some published findings do not seem to work for other researchers, or that lots of money, time and effort is wasted this way).

      The only downside to having lower standards (e.g. p < .24 is sign.) would be that maybe some people would view findings with p < .05 as more interesting to further investigate, or take seriously. That would not make any sense of course reasoning from a scientific perspective...

      Maybe "confirmatory" research should become the new "exploratory" research. I would think that when this "exploratory 2.0" research will be published and subsequently used, and reproduced, there will be more than enough "confirmatory 2.0" research possible.

      Delete
    5. The usefulness of publishing exploratory research depends on the circumstances of the research. Medical research generally involves relatively expensive lab tests and diagnostic procedures. More importantly, there are significant ethical issues in exposing large numbers of sick patients to a treatment that may not be effective, or may do more harm than good. In this situation, doing small exploratory studies to justify a large study is appropriate. Exploratory and confirmatory research are usually separately funded projects and may be done by different groups. Publishing the exploratory research is appropriate in this situation.

      The other extreme may be experiments in psychology using undergraduates as subjects. These studies have relatively little financial or ethical overhead. In this situation it makes sense to expect larger exploratory studies and/or confirmation before publication. At the same time, psychological research also needs to be done with populations other than undergraduates. Cost and overhead increase significantly as the research moves out of the university environment. Here too, some of the most interesting research may have exploration and confirmation as separately funded projects by different groups.

      I do not see a universal principle about publishing exploratory research, but pre-registration for confirmatory studies appears to me to be valuable whatever the publication practices are for exploratory research.

      Delete
    6. I think the debate (not so much here, but on Twitter and Sophie's piece) has got diverted into an argument about whether exploratory research has merit.
      This is really missing the main point: exploratory research is OK provided we know it is exploratory. At present, I think a huge amount of research that is presented as if it is hypothesis-testing was not done that way, but rather the results were selected after looking at a much wider range of possibilities, which aren't then presented in the paper. It is esp. tempting to do this when you have a large multifactorial dataset.
      In this case you COULD do a Bonferroni correction (or something similar) and then be able to use inferential statistics appropriately, but all too often it's not done, because it's easier to focus just on the 'significant' findings and forget the rest.
      Let's have exploratory research by all means, but we need to be able to identify it as such. If we have stated in advance what the hypothesis is and how we plan to test it, then we can interpret the statistics appropriately.

      Delete
  6. Great post Dorothy.
    Exploration and pre-registration are not opponents; pre-registration is an ally to exploration.

    ReplyDelete
  7. Dorothy,

    "Genetics really highlights the problem..." but the example doesn't at all support your conclusion that pre-registration was/is the solution. Neuroimaging genetics has the ENIGMA consortium, in which I and my colleagues participate, so replication samples and large-scale data sharing are being utilised in neuroscience too. This culture shift was enacted without pre-registration.

    You've blogged before about the dubious natures of research quality assessment exercises and the studies reported in high impact factor journals, so perhaps it would be worth extending the debate about pre-registration as one potential solution to questionable research practices to another potential solution: holding journals - and editors - directly accountable for the findings they publish. We need a metric that assesses journals' quality according to the ratio of positive vs null findings they publish and the number of replication studies they publish. If the former is too high, the science in the journal is questionable. If the latter is too low, the journal editorial policy does not reflect adherence to good science. Aside from increasing the rate of replication studies, a journal quality metric of this sort could prove quite useful for reducing questionable research practices and reduce the reliance on impact factors, but would require endorsement from the field before it could be used in assessment exercises or by funding agencies and promotion panels.

    ReplyDelete
    Replies
    1. Excellent point!

      Even better would be tracking what percentage of studies in a given journal replicate and using that as a major component of impact factor. Without replicability figuring directly into journal prestige, it's going to be awfully difficult to get editorial policies to change. (Plus, if they do change, how would we know if the change worked?)

      Delete
    2. Thanks Greig/Josh.
      Although I like the idea of ranking journals according to 'replicability' of research, it won't work because replication attempts are so rare - at least in many fields. Replication is seen as dull; everyone wants to do something new.
      I should stress too that I agree that prereg is not the only solution - though I think it is a solution that applies to problems 1 and 2 together (ie #1 publishing null results, #2 not publishing exploratory results as if they are hypothesis testing).
      As genetics has shown, requiring replication is an alternative good way of fixing problem #2. But it does not fix #1. Also, for some of the studies I do, it would be much easier to go through a prereg process than to replicate a study - it can take 3 yr to recruit 30 language-impaired children for an electrophysiological study, for instance, and another year to analyse and write up the data. If I got an interesting result, I would then have to do it all again, and I'd be 8 yr away from starting. On the other hand, if I got an uninteresting result, it would be unpublishable.


      Delete
    3. I wish to note that in my experience, which is the experience of, to my knowledge, one of the few people who's actually tried out the new strict pre-registration, anything submitted to a Cortex-like format will have a high chance of already *being* more or less a replication, especially if it is using methods with many parameters. This is because to make any such (e.g. neuroimaging) study work, these parameters need to be tweaked, and the only way to properly tweak them is with a data set very similar to what you are aiming to pre-register (an Cortex explicitly gives the option to even publish your pilot data as pilot data).
      Maybe others are bolder than me, but before *I* click that "submit" button, I make sure I've applied the methods to some data set or three already and seen them work.

      Delete
    4. Dorothy,

      I agree that replication attempts are rare though also increasing due to the attention now being directed to this important issue. The idea is to rank the journals to provide incentive for better practice. I would expect that some very high ranking journals would receive very poor scores on a metric like this were it calculated right now, due to their exclusive focus on publishing positive results. The metric would only be useful as an incentive if we all agreed to use it, with the ‘poor scientific quality’ journals identified and chastised suitably. Fixing #1 would follow from our resolve to apply the metric.

      I disagree that #1 is not changed by the approached adopted in genetics. GWA studies reveal influential polymorphisms with replicable effects. Polymorphisms identified by previous candidate gene studies might not replicate. Previously untested polymorphisms will show null results.

      You imply that your advocacy for pre-registration is motivated more by the filedrawer issue. Would it not be equally effective to penalise journals via a ranking measure for not publishing null results, irrespective of whether the study was pre-registered, provided the study was methodologically sound according to peer review?

      Delete
  8. My concerns about pre-registration are very simple. The goal is to improve replicability. But what is being measured and promoted is pre-registration. To the extent that the pre-registration cost-function departs from the replication cost-function, you end up in the perverse situation where demanding pre-registration *lowers* replicability.

    The simplest example is that if you do your studies online (which more and more of us do), it actually takes more time/effort to pre-register your study than replicate it. Given the limited number of hours/day...

    Another example: suppose you analyze a dataset one way (as pre-registered), and then a reviewer correctly points out that there is a better way to analyze it, which changes the results. If you judge manuscript quality based on pre-registration, the author would be better off sending the paper to a different journal hoping for new reviewers than analyzing the data correctly!

    Pre-registration will do little to solve the problem of researchers running dozens of different experiments and only publishing the one that "worked" -- a documented problem.

    The Chambers FAQ mostly discusses the plan to have journals review the methods and not the results. This might make sense in fields that still do one experiment/paper (which is itself a problem: psych journals requiring multiple experiments/journal was one of the original replicability reforms!), but I usually have around 10, each of which is contingent on the previous ones. Does the paper get reviewed 10 times? How many years will that take?

    Most importantly, the focus on pre-registration pulls interest & energy away from efforts to deal with replicability straight-on. As I've pointed out elsewhere (http://bit.ly/14ouwCf), if we don't track replicability, we'd have no way of knowing whether any given reform (like pre-registration) had any effect.

    ReplyDelete
    Replies
    1. Excellent ideas re tracking replicability in your Frontiers article, Josh!

      Delete
    2. I agree the situation is very different in a field where the norm is to publish a series of experiments, as, in general, even if there is not straight replication, each study builds on the previous one and confirms the solidity of the findings.
      Sadly, that is not the case in many fields.

      Delete
  9. * "My concerns about pre-registration are very simple. The goal is to improve replicability. But what is being measured and promoted is pre-registration."

    &

    "Most importantly, the focus on pre-registration pulls interest & energy away from efforts to deal with replicability straight-on."

    If I am not mistaken, pre-registration (as in the Cortex model) helps with HARKing, file-drawer problem, p-hacking, and high power of studies.

    If I am not mistaken, 3 of those 4 issues can possibly be tied to replicability of findings (http://www.psychologie.hu-berlin.de/prof/per/pdf/2013/Replicability_target_Peer_commentary.pdf).

    If that is correct, then maybe you could state that pre-registration (following the Cortex model) would possibly help with replicability issues.

    * "Another example: suppose you analyze a dataset one way (as pre-registered), and then a reviewer correctly points out that there is a better way to analyze it, which changes the results."

    I think the reviewer in the Cortex-model pre-registration would/could point this out as well, in stage 1 (see http://cdn.elsevier.com/promis_misc/PROMISpub_idt_Guidelines_cortex_RR_17_04_2013.pdf)

    * "Pre-registration will do little to solve the problem of researchers running dozens of different experiments and only publishing the one that "worked" -- a documented problem."

    If I am not mistaken, because the Cortex-model pre-registration accepts a study before the results are known, this would help with not only reporting studies that "worked".

    I think a scientist could maybe run dozens of studies but that will cost resources (perhaps especially with many participants). I don't think they will pre-register all those test-studies. I think they would only invest their resources in a study which they would have some confidence in, e.g. regarding finding a significant effect, or regarding the importance of the findings (be they sign. or non-sign. as for instance in therapy-evaluating research or something like that), and these could be the studies they could then pre-register.

    * "if we don't track replicability, we'd have no way of knowing whether any given reform (like pre-registration) had any effect."

    I think it would be interesting indeed to compare pre-registerd studies replicability to non-pre-registered studies.

    ReplyDelete
    Replies
    1. "I think it would be interesting indeed to compare pre-registerd studies replicability to non-pre-registered studies."
      Sadly, that would be an underpowered study as of now.

      Delete
    2. @Anonymous - Yes, everything you've said about the Cortex model is correct.

      I would also add that the Cortex model (and the models emerging at APP, PoPS, Human Movement Science, and elsewhere) will incentivize replication.

      Why? Suppose you come across a finding of high interest in your field - one that you feel should be directly replicated. At the moment there are few journals that would be interested in publishing such efforts, so you have little incentive to try. Moreover, even if it was a finding a particular importance, any mainstream journal would likely assess the publishability of the replication based on the Results (as would the reviewers). Given the investment of time and money it would likely require to conduct a high-powered replication, and given the uncertain publishability (and likely low pay-off), who would bother?

      Pre-registration solves this problem by (a) virtually guaranteeing publication before you commit the resources to running the study; and (b) making it impossible for reviewers to block publication because the results failed to replicate.

      I would therefore disagree strongly with Josh above. On the contrary, pre-registration is one of replication's strongest allies.

      Delete
    3. Chris -- I'm not saying that pre-registration is always a bad idea. And to be clear: my concern is *replicability* not *replication*. The interests of good science and pre-registration often line up, but many people have brought up many cases in which those interests diverge. In those cases, we'll have to decide which we care about more: replicability or pre-registration.

      Anonymous -- Guaranteeing publication before the study has been run will make it more likely the study is reported, but does not guarantee the study will be reported. How could you enforce that? In any case, I doubt we'll ever live in a world in which most studies are reviewed before they are run. Pre-registration through, e.g., the Open Science Framework does not guarantee publication.

      Delete
    4. Hi Josh - Yes good point. I think your idea for tracking replicability is excellent and much-needed! What I'm not sure about are your arguments that pre-registration works against replicability (especially when you consider, for instance, the stringent power requirements).

      I'm travelling at the moment, but will think carefully and write something more considered.

      Meanwhile, you do raise a good point about multi-experiment papers. At Cortex we do have an option for fast-tracked incremental registration but I agree that for a 10-experiment paper, this could be overly burdensome.

      Delete
    5. "Pre-registration through, e.g., the Open Science Framework does not guarantee publication."

      I always wonder how many studies are kept in researchers' file-drawers. It seems like such a waste of resources having done all the work, and then not making the results known somehow. From a scientific perspective, I think that could also be seen as less than optimal.

      Pre-registration through the Open Science Framework would not guarantee publication, but perhaps that would/could still help with the file-drawer problem. This is because the information would be known now (i.c. not hidden in the file drawer), and others can investigate this information, refer to it, etc.

      As an individual researcher, you could also keep a personal site where you post your non-published research, or put your non-published research in a repository or something like that to help with the file-drawer problem of course.

      Delete
    6. Chris: Glad Cortex will have a fast-track for multiple experiments.

      Just to be clear, I'm not against pre-registration as an option for those who want it. There are clear cases where it would provide some advantage. When I get worried is when it is discussed as a magic bullet or as an ends in itself, rather than a means to an ends (and not even necessarily the best means to that ends).

      Long ago there was a reform in favor of confirmatory analyses, so now we have a strong incentive to claim our analyses were confirmatory in order to get published. If pre-reg becomes similarly dominant, people will have a strong incentive to stick to the pre-registered plan even when they realize subsequently it was a bad plan. Pre-reg becomes the goal rather than the method, and we have distorted incentives.

      Maybe this never happens to you, but I do periodically realize my original analysis plan was nonsensical -- sometimes because of reading a stats textbook (cf when I abandoned ANOVAs), sometimes because of newly published results changing how I viewed the problem, and sometimes because I had made a mistake.

      Since I routinely self-replicate prior to publication I may be in a better position than most to know whether this is causing me problems. It's not. Ironically, the only times I've failed to replicate myself was when I stuck to the original analysis plan!

      My experience may not generalize (though I've noticed that many people have the same concerns). I guess we'll find out.

      Delete
    7. The way I see it, the basic problem is that p-value fishing and bias against negative results has created a situation in which far more than 5% (p=0.05) of published results are suspect.

      Pre-reg could definitely bring us closer to that 5% mark but at what cost? Aren't some of those fished results actually valid?

      So wouldn't a better approach be to allow negative results to be published (forget pre-reg) and find a way to "deal with" the increase in the number of publications that would result - a large publication space? Surely the e-age gives us tools to handle such a large publication space that did not exist at the time that our present publication system evolved. And if we could do that then would we need journals at all?

      Perhaps what is needed is vast improvement in our abilities to do post-publication review? Why call on the expertise (sometimes too limited) of just a few reviewers when tens to thousands of potential reviewers could be called upon?


      Delete
    8. I totally disagree.
      Look at the chart of p-values in my post. These come from RANDOM data. Some are highly significant. No amount of post-publication peer review will help us in this situation.
      Also, I work in a field where it can take ages to recruit participants; for instance, there is a growing body of work on development of infant siblings of children with autism - these children are immensely valuable in helping us understand early indicators of autism, but they are also very rare. And often families are asked to take part in studies that using challenging approaches such as brain measurements in toddlers. Studies with this group typically look at many variables without clearly specifying hypotheses in advance - and invariably come up with something. Yes, what they come up with *may* be valid, but we just can't know whether it is without replication, and replication in this situation is well nigh impossible.
      For an example, see comments by myself and drbrocktagon here http://bit.ly/18x5t7n

      Delete
    9. Deevybee


      Yes that data was random. Presumably that data would represent a REAL experiment which could be reproduced. If, as proposed by you and I, negative results were published then those erroneous results would be caught in those reproduced and negative studies.

      Also if replication is not possible then the result will always be, or at least should always be, suspect.

      The fMRI literature is plagued by shoddy methods. Typically there are multiple pre-processing steps which have not been scientifically justified. Typically there are hardware issues that produce data errors that are not properly understood by the neuroscientist. Some of these hardware issues interact in with the prep-processing issue in ways that make both problems much worse. These shoddy practices have propagated for many reasons. One reason is that the reviewers lack sufficient expertise to make a proper evaluation of the methods. Opening up publication review to a larger set of reviewers could ameliorate that problem.

      Delete
  10. @Chris Chambers - your claim that pre-registration will 'virtually guarantee' publication needs to be examined. Editors retain the right to determine whether an article is suitable for submission to the journal. Let me provide a concrete example from the journal Cortex of how this can be a problem. Volume 48, Issue 7 of that journal is a Special Issue on “Language and the Motor System”: http://tinyurl.com/lqm4kya

    Yet, the entire issue is composed of articles written by proponents of language embodiment. Not one article from the alternative perspective. Is there something wrong with the editorial culture at Cortex?

    ReplyDelete
    Replies
    1. I don't know anything about this debate and am not in the business of defending Cortex, but just want to say that this is really a side-issue.
      There have been other comments in various places about how pre-registration won't defend against biased editors or reviewers who turn down stuff because they don't like the approach, or who accept it because the author is famous. Yes this happens, but I don't see how pre-registration will make it any worse - and in fact it may make it better, because editors and reviewers will be able to look at the quality of the idea and methodology without being swayed by the outcome of the results.
      Suppose I hate the language embodiment approach and I am asked to review a prereg outline that adopts that approach.
      Assuming the methods are OK, I either have to come up with good arguments about the inadequacy of the study logic, or I'd have to accept the study. Because I know that the study may fail to find the effect it's predicting, I'll actually have some incentive to accept it, in the hope that it will disprove the theory I dislike.

      Delete
    2. @ Greig de Zubicaray -

      You cross posted this comment over at Neurocritic's blog but for the sake of clarity I'll respond here as well.

      I don't understand how your comment is relevant to this discussion, and I'm not in a position to comment on specific editorial decisions at Cortex. I suggest you contact the journal with your concerns.

      What I will clarify, for the benefit of other readers, is that none of the articles you refer to are Registered Reports, and so none of them were subject to the specific publication criteria that we have established for this new format.

      Cortex virtually guarantees publication of RRs that pass in-principle acceptance. The specific publication criteria at Stage 2 (following data collection) are:

      • Whether the data are able to test the authors’ proposed hypotheses by passing the approved outcome-neutral criteria (such as absence of floor and ceiling effects)
      • Whether the Introduction, rationale and stated hypotheses are the same as the approved Stage 1 submission (required)
      • Whether the authors adhered precisely to the registered experimental procedures
      • Whether any unregistered post hoc analyses added by the authors are justified, methodologically sound, and informative
      • Whether the authors’ conclusions are justified given the data

      Readers can find full details here: http://www.elsevier.com/inca/publications/misc/PROMIS%20pub_idt_CORTEX%20Guidelines_RR_29_04_2013.pdf

      Delete
    3. Dorothy and Chris,

      To clarify the relevance of my point: from time to time, journals will announce special issues on particular topics. The editors will solicit articles and assign reviewers. The editors may or may not be members of the regular editorial board. The act of solicitation is a funnel, as is the assignment of reviewers (so a critical reviewer, per Dorothy’s example, would be less likely to be assigned). Even if pre-registration were a submission rule, soliciting a narrow range of theoretical views to be submitted to a limited range of review will result in a series of papers in a journal issue endorsing just those views. If pre-registration were not the rule, there is only the increased probability that the articles submitted for the special issue will have some dubious results.

      The point of this example is that pre-registration does not address the quality of the science, it addresses its methodological rigour. “Quality” and “truth” are not determined by rigour alone. Quality may be more subjective, judging by the number of published studies that choose to reify rather than test theories in the field of cognitive neuroscience, and the criticisms the field has attracted from other disciplines as a result.

      Chris – hopefully this explains the relevance of my example. The issue is not limited to Cortex, nor to the topic of language embodiment as you imply in your response – I merely used the Cortex special issue as an example that I am obviously familiar with given my research interests. I noted the tendency to sanctimony in your responses at the Neurocritic. This one was no different.

      Delete
  11. If studies are pre-registered in a somewhat private way with a particular journal or on private pages on OpenScienceFramwork, the value for solving the file-drawer problem is reduced. One important goal of the registries for medical research is to provide public information about studies that are being done and have been done—and in a way that minimizes the burden to experimenters, while providing basic methodological benefits. The most widely used medical registry currently has over 149,000 registered studies and is often the starting point when someone wants to find research about a particular topic. It is at http://www.clinicaltrials.gov/ct2/home

    Note that medical journals increasingly require as a condition for publication that studies were registered at a “public” registry. The statement of the International Committee of Medical Journal Editors can be found at http://www.icmje.org/publishing_10register.html

    Information on the history of study registration in medical research can be found at http://www.clinicaltrials.gov/ct2/about-site/history

    Optimal practice with most flexibility would be to have different registries, with some emphasizing making information publicly available. A particular study could be registered on different registries.

    ReplyDelete