Monday, 17 June 2013

Research fraud: More scrutiny by administrators is not the answer

I read this piece in the Independent this morning and an icy chill gripped me. Fraudulent researchers have been damaging Britain's scientific reputation and we need to do something. But what? Sadly, it sounds like the plan is to do what is usually done when a moral panic occurs: increase the amount of regulation.

So here is my, very quick, response – I really have lots of other things I should be doing, but this seemed urgent, so apologies for typos etc.

According to the account in the Independent, Universities will not be eligible for research funding unless they sign up to a Concordat for Research Integrity which entails, among other things, that they "will have to demonstrate annually that each team member’s graphs and spreadsheets are precisely correct."

We already have massive regulation around the ethics of research on human participants that works on the assumption that nobody can be trusted, so we all have to do mountains of paperwork to prove we aren't doing anything deceptive or harmful. 

So, you will ask, am I in favour of fraud and sloppiness in research? Of course not. Indeed, I devote a fair part of my blog to criticisms of what I see as dodgy science: typically, not outright fraud, but rather over-hyped or methodologically weak work, which is, to my mind, a far greater problem. I agree we need to think about how to fix science, and that many of our current practices lead to non-replicable findings. I just don't think more scrutiny by administrators is the solution. To start scrutinising datasets is just silly: this is not where the problem lies.

So what would I do? The answers fall into three main categories: incentives, publication practices, and research methods.

Incentives is the big one. I've been arguing for years that our current reward system distorts and damages science. I won't rehearse the arguments again: you can read them here.  The current Research Excellence Framework is, to my mind, an unnecessary exercise that further incentivizes researchers against doing slow and careful work. My first recommendation is therefore that we ditch the REF and use simpler metrics to allocate research funding to University, freeing up a great deal of time and money, and improving the security of research staff. Currently, we have a situation where research stardom, assessed by REF criteria, is all-important. Instead of valuing papers in top journals, we should be valuing research replicability

Publication practices are problematic, mainly because the top journals prioritize exciting results over methodological rigour. There is therefore a strong temptation to do post hoc analyses of data until an exciting result emerges. Pre-registration of research projects has been recommended as a way of dealing with this - see this letter to the Guardian on which I am a signatory.  It might be even more effective if research funders adopted the practice of requiring researchers to specify the details of their methods and analyses in advance on a publicly-available database. And once the research was done, the publication should contain a link to a site where data are openly available for scrutiny – with appropriate safeguards about conditions for re-use.

As regards research methods, we need better training of scientists to become more aware of the limitations of the methods that they use. Too often statistical training is a dry and inaccessible discipline. All scientists should be taught how to generate random datasets: nothing is quite as good at instilling a proper understanding of p-values as seeing the apparent patterns in data that will inevitably arise if you look hard enough at some random numbers. In addition, not enough researchers receive training in best practices for ensuring quality of data entry, or in exploratory data analysis to check the numbers are coherent and meet assumptions of the analytic approach.

In my original post on expansion of regulators, I suggested that before a new regulation is introduced, there should be a cold-blooded cost-benefit analysis that considers, among other things, the cost of the regulation both in terms of the salaries of people who implement it, and the time and other costs to those affected by it. My concern is that among the 'other costs' is something rather nebulous that could easily get missed. Quite simply, doing good research takes time and mental space of the researchers. Most researchers are geeks who like nothing better than staring at data and thinking about complicated problems. If you require them to spend time satisfying bureaucratic requirements, this saps the spirit and reduces creativity.

I think we can learn much from the way ethics regulations have panned out. When a new system was first introduced in response to the Alder Hey scandal, I'm sure many thought it was a good idea. It has taken several years for the full impact to be appreciated. The problems are documented in a report by the Academy of Medical Sciences, which noted "Urgent changes are required to the regulation and governance of health research in the UK because unnecessary delays, bureaucracy and complexity are stifling medical advances, without additional benefits to patient safety"

If the account in the Independent is to be believed, then the Concordat for Research Integrity could lead to a similar outcome. I'm glad I will retire before the it is fully implemented.


  1. I think an important aspect would be to amend 'publishing practice' (which actually creates many of the incentives, or at least enables them) with including not just text-based summaries of our data, but to always, necessarily include data and software with each publication. Making it all openly accessible disincentives (but doesn't eliminate!) fraud by facilitating detection.

  2. 'It might be even more effective if research funders adopted the practice of requiring researchers to specify the details of their methods and analyses in advance on a publicly-available database.' It is interesting to note that per-specification of statistical analysis is required of pharmaceutical sponsors by drug regulatory agencies. The International Conference on Harmonisation E9 guideline on statistical analysis would be a good place to start;2-F/abstract

  3. Some research fraud is just too bad painful to expose and things are unlikely to change.
    I’ve been trying to expose research and financial fraud at Manchester University for the last nine years, but nobody in the British research establishment wants to know.
    For details see

  4. Thanks to all for comments.
    I have been relying only on the account in the Independent and hope to get a chance to read the Concordat itself before too long. I am hoping it is not as extreme as suggested in the article.
    Meanwhile, I woke up this morning to find this post had prompted an interesting debate on Twitter between @medskep and @david_colquhoun, and David makes a point I'd been brooding on, which is worth airing more widely. If someone is going to check all my data and analyses, then they are going to need amazing analytic and statistical expertise.The impression from the account in the Independent is that those creating the Concordat have a remarkably limited view of what research data looks like, consisting of spreadsheets and graphs. That may be true in some fields, but these days many scientists are using complex multivariate datasets that are analysed using fiendish statistical techniques. For instance, when I do an ERP experiment, for each participant I end up with raw data that is a enormous string of datapoints consisting of amplitude of the signal sampled 250 times a second for about 10 minutes on each of 32 channels. This has to go through a great deal of processing, which will include steps of automatic rejection of artefacts (very big signals associated with unwanted activities such as blinks), cutting up into epochs that are time-locked to the stimuli, averaging, filtering, and maybe a time-frequency analysis that extracts the power in the signal at different frequencies over specific time periods. My preference is also to do some data reduction to extract a component from multiple electrodes to represent the signal of interest - this is done to reduce the likelihood of spurious results - see my post of 7 June 2013. Data from some participants will be so noisy it has to be rejected - this should be by a predetermined criterion. Ultimately, out of all this processing, I will get some numbers that correspond to the size of the signal related to specific sensory stimuli for each participant and condition. These may be entered onto a spreadsheet and plotted or analysed using a well-known package such as SPSS, though increasingly researchers are using dedicated software - often Matlab-based - to carry out all the analysis. When I have students or collaborators who want to work with me on this, I will typically spend several hours taking them through the software to ensure that they understand each processing step. This is just one type of research. Brain imaging is even more complicated. Even for simple behavioural studies, many researchers now use complex methods such as multilevel modeling to analyse the data.
    So the notion that the University is going to hire someone who will check all our spreadsheets and figures is just insane. It could only have been formulated by someone who has no idea of the reality of what researchers get up to.
    The complexity that I describe IS a problem. It makes it relatively easy for analytic errors to creep in, and it can be hard for journals to find reviewers who adequately understand the methods. But policing is not the answer. We need time to do careful work, and to painstakingly check and double check. If we know that our raw data have to be deposited and made openly available, we will have strong incentives to get it right.

  5. Dorothy's comments apply perhaps even more strongly to the sort of biophysical work that I've spent most of my life doing. The programs that we use for analysis and fitting of single ion channel data must have hundreds of 1000s of lines of code (I've never counted them). We've recently put all the source code on the web, but the chances of anyone checking that they do what we claim is not large. Even if they did, it would be next to impossible to tell whether we'd cheated.

    It took four years to do the experiments and analyse the data for a 2008 paper. To repeat all the analyses would take at least a year of time for someone with highly specialist knowledge. However desirable it might be, that obviously isn't going to happen

    The proposals that were described in the Independent seem to have come from people who have very naive view of what science consists of.

    The simplest way to start would be to remove the perverse incentives that are imposed by funding agencies and universities. It is insane to have a system that actively encourages dishonesty and triviality.

  6. Please read what follows as coming from someone with good intentions. I am trying to say some things I have not heard before in these types of discussions, to see if they could show things from a different perspective:

    "Most researchers are geeks who like nothing better than staring at data and thinking about complicated problems."

    I wonder how this then relates to the problems of your 3 categories of incentives...In other words, if I were to be a true "geek" who liked nothing better than staring at data etc. would I then need better methodologial education or would I have spend my time searching the interwebs looking for the best places to be educated on these matters and would have hereby found all this myself? (I totally agree that it would be a good thing to be educated optimally, but this was just to illustrate my point relating to questioning what kind of scientists there really are at the present moment in science/ as a result of the problems in science).

    The same goes for publication (i.c. would I as a true "geek" choose a high-impact journal over publishing the research as I want it to be published in a "non-high impact journal"?) and perhaps also incentives in general (i.c. would I as a true "geek" even care about things like "tenure" and all that fancy stuff?).

    The point I am trying to make is that I wonder if most scientists working in science at the present moment are in fact true "geeks": if they truly were, would there be these problems?

    Now, totally regardless of the above, here is the thing I wonder most: what's stopping already tenured professors from doing things right now? I mean, it seems that they have already been promoted and all that fancy stuff, so what have they got to "lose" by doing things optimally? Can they then be fired if they do not publish an x-amount of high-impact journal papers at this point in their careers ? I don't know the answer to this, but it seems to me that if this is not the case, then why don't all these tenured professors give the right example? (I totally agree that these outside circumstances/ incentives should be tackled, but this was just to illustrate my point relating to the question whether tenured professors at this point in their careers could in fact very easily set the right example because they need not worry anymore about all these outside circumstances/ incentives).

    p.s. I think I am going to rob a bank, because I am living in a "pay-the-rent-or-perish"-system which incentives are all skewed. They are all skewed I tell you :P

    p.p.s. I think I am going to take doping, because I am living in a "win-the-race-or-perish"-system which incentives are all skewed. They are all skewed I tell you :P

  7. I kind of feel sorry for all these scientists who apparently have been some sort of slave in a deeply flawed, and possibly pretty sad, system for years and years. All these incentives and pressures whippin' 'em to do all these things and make them deal with all that's wrong in science today.

    Sometimes a different perspective from an outside source can perhaps contribute a little bit to some, apparently, much needed release. It may be weird, but hopefully it's at least fun regardless of anything else:

    1) "there is no spoon" (i.c. take some time off from all these incentives and pressures and watch "The Matrix"),

    2) "the thing is Bob, it's not that i'm lazy, it's that I just don't care" (i.c. take some time off from all these incentives and pressures and watch "Office space").

    Have fun !!