Skip to content

How not to do replications

Recently there was an article on the Science webpage about replication problems in artificial intelligence research. The article mainly highlights the fact that many studies in the field fail to provide supplemental code and data. But it also mentions an example how replications can go wrong.

They mention the project and journal ReScience. It describes itself as "a peer-reviewed journal that targets computational research and encourages the explicit replication of already published research, promoting new and open-source implementations in order to ensure that the original research is reproducible."

Replicating studies is generally considered a good thing and a step to improve science and highlight problems. Yet the Science article mentions that up until now all replication attempts published at ReScience have been positive. This is highly implausible, but a possible explanation is provided: Scientists don't want to criticize their peers, therefore failed replications aren't published, particularly because they're often done by young researchers that don't feel confident criticizing their senior peers.

If this is true we have a pretty dire situation, and one that isn't helpful at all: People try to replicate other people's work, but they'll only publish it if it's positive. One could very well argue that this makes things worse, not better, as it increases publication bias.

This very problem has been highlighted by a group of psychology researchers in 2015 in a paper titled The Replication Paradox. There's a good summary at Retraction Watch.

While they name it a paradox the effect is actually not so surprising. If you replicate studies but you have publication bias, meaning you only publish successful replications and not failed ones, you may end up creating the impression that an effect is even stronger than the original potentially flawed research indicated. The public scientific record gets worse, not better.

This shows that it is crucial that replication efforts also make sure they counter publication bias. A way of doing this is study preregistration, meaning that one registers the intent to do a study before actually doing any data collection or experiments in a public register. Other replication efforts like the Open Science Foundation's Reproducibility Project: Psychology included preregistration by default.

Science is broken - talk and background

I gave a talk at the 34C3 congress: Science is broken

Video downloads, Youtube version, Slides

Here's a list of related interesting links for people who want to learn more about these issues.

Is most published research wrong? (Video)
Why Most Published Research Findings Are False, John P. A. Ioannidis, PLOS
Reproducibility and Improving Research Practices (Video, BIH annual special lecture, John Ioannidis)

Precognition / fortune telling study
Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect., Daryl Bem
Daryl Bem Proved ESP Is Real. Which means science is broken (Slate)
Publication bias and the failure of replication in experimental psychology, Francis 2012, Psychonomic Bulletin & Review

Replication crisis in Psychology
Reconstruction of a Train Wreck: How Priming Research Went off the Rails
Estimating the Reproducibility of Psychological Science
Reproducibility Project: Psychology (Open Science Framework)

Green consumption study / Moral licensing
Do Green Products Make Us Better People? Mazar, Zhong, 2009, Psychological Science
Green consumption does not make people cheat: Three replications of a moral licensing experiment, Urban, Bahník, Kohlová, PsyArXiv, 2017

Replication in preclinical cancer studies
Drug development: Raise standards for preclinical cancer research, Begley, Ellis, 2012, Nature
Cancer reproducibility project releases first results, Nature, 2017
Reproducibility Project: Cancer Biology

Drug trials
What doctors don't know about the drugs they prescribe, Ben Goldacre (TED talk)
AllTrials campaign
COMPARE - Tracking switched outcomes in clinical trials

Early Publication Bias research
Publication Decisions and their Possible Effects on Inferences Drawn from Tests of Significance—or Vice Versa, Theodore Sterling, American Statistical ssociation, 1959
Publication Decisions Revisited: The Effect of the Outcome of Statistical Tests on the Decision to Publish and Vice Versa, Sterling, Rosenbaum, Weinkam

Registered Reports
Trust in science would be improved by study pre-registration (Open Letter in The Guardian, 2013)
Registered Reports (Center for Open Science)

As a study participant I care what happens with the results

I got a question to participate in a survey for a scientific study a couple of days ago due to some work I do in the field of IT security. I wrote the e-mail below to the study author. While participating in a survey or not is not a big deal I think everyone participating in any kind of study should care about the scientific practices she/he supports.

Dear [...],

You have asked me to participate in your study via an oanline survey. I'm a scientifically minded person, therefore supporting science is usually something I'd do.

However I feel that the information you give participants of the survey is disappointing. You tell participants that they will likely have no risk in participating in the study and that they can win an Amazon gift card. However what I'd like to know – and what you don't tell me – is if and in which way the study I contribute to will be published. I therefore have no way of knowing whether this study will ever be published at all, whether it will support or hurt the body of scientific knowledge and whether I'll ever be able to see the result.

As a person interested in science I care about the quality of scientific research. Therefore I find these questions important and I think every person asked to participate in any kind of study should ask such questions.

There's a widespread problem in many fields of science that's known as publication bias. Many scientific studies never get published, because the result of the study doesn't align with the beliefs of the study author or simply isn't considered interesting. There's a large bias towards only publishing positive results. This poisons the body of evidence science is creating and is likely one of the major reasons why so many scientific results turn out to be not reproducible.

Apart from this issue I don't want to support a scientific publishing system that is only profiting some publishers, while keeping research behind paywalls. I therefore would like to know whether studies I participate in will be published as Open Access.

I'm sorry to tell you that I won't participate in your survey. I suggest however that you consider these issues and in future studies inform your participants how you intend to publish the results and how they can find out where to find the results later.

I will publish an anonymous version of this e-mail (without your name and the purpose of your study) on my blog

Data Sharing and the Research Parasites

Since I cultivate an interest in all the weaknesses today's scientific process one thing I often ask myself is why the scientific community isn't fixing things faster. There are many obvious improvements that are ignored by many scientists, problems that are not tackled, and sometimes there is outright resistance against changes that should hardly be controversial.

A recent editorial in the New England Journal of Medicine (NEJM) by Dan Longo and Jeffrey Drazen (who is the editor of the NEJM), is a very extreme example of this. It is pretty much a vocal defense of bad research practice.

In most cases data sharing should be a no-brainer. Scientific studies that rely on raw data should not only publish their results, they should, whenever possible, also make their raw data available. There are some instances where data sharing can be problematic and needs to be done carefully, for example if privacy issues are involved, but these aren't the concerns that the writers of this NEJM editorial have. Data sharing is valuable for two reasons: First of all it allows other people to check and potentially criticize scientific results. And second it allows others to find additional research results that may be hidden in a raw data set.

Longo and Drazen recognize the second issue and propose that data should be shared, but only in collaboration and with co-authorship of the people who created the original data set. There is of course nothing wrong with collaborating with the original authors if it makes sense, but it completely fails to address the first and foremost reason for data sharing: To allow others to reinvestigate and criticize scientific results.

In fact Longo and Drazen see a problem in this. They list it as a concern against data sharing that others may “use the data to try to disprove what the original investigators had posited.” The implicit assumption in this statement is that the original authors of a study are always right and if someone else reinterprets their result they are automatically wrong – which is of course total nonsense.

The editorial goes on with concerns that data sharing may create “research parasites”. I find it hard to understand what that should actually mean. Taking someone else's data and using it to create new scientific results seems like a good thing. Especially in medicine – remember this editorial was published in one of the leading medical journals – it is almost an ethical obligation to use existing data as much as possible to foster scientific progress.

Longo and Drazen say that they propose a symbiotic instead of a parasitic use of data sharing. As said above, this is fine in many situations, but imagine this: If a data set is supportive of a scientific result that goes against the theories and beliefs of the people who collected that data – should that new scientific result stay hidden? I don't think so.

The editorial has caused a bit of an outrage and on Twitter the hashtag #researchparasites gained some popularity. In a certain sense I see this editorial as an opportunity. It's rare to have such an honest account of why some people reject improvements in science: To shield themselves from criticism and scientific rigor. But it's shocking that one of the leading medical journals is supporting that.

Also worth reading: Translation to plain English of selected portions of Longo and Drazen's editorial on data sharing (Jonathan Peelle)

Chocolate, weight loss, bad studies and bad journalism

Dark ChocolateOn Friday a documentary aired on German and French television channel Arte about dietary and weight loss studies. The authors of the documentary had done an experiment where they created a fake study claiming that eating dark chocolate can help with weight loss. The study was done in cooperation with science journalist John Bohannon, who wrote about the experiment on

They did a real study and published it in a journal, but it was obviously flawed in many ways. They published a press release and created a webpage for an Institute of Diet and Health which doesn't exist. Shortly afterwards a number of media outlets started reporting about this study, the first big one was "Bild", the largest German newspaper.

There are a number of ways in which this study was flawed:
  • The study had 15 participants, which is a very low number for such a study.
  • The study was published in an obviously bad journal (International Archives of Medicine). There is a large number of such scientific journals that publish just anything if you pay them for it. Recently there was a case where someone successfully submitted a paper only containing the sentence "Get me off Your Fucking Mailing List" to such a journal.
  • In the documentary it is mentioned that during the measurements the participants in the control group received a glass of water before they where weighted.
  • The authors where cherry picking their results. They did a lot of measurements on the study participants. By pure chance one of the value they measured would improve in a significant way. This kind of flaw in scientific studies is best explained by this xkcd comic.

The last point is probably the most interesting, because it can not necessarily be spotted in the final publication. One way to avoid this is the pre-registraiton of studies in public trials registers together with the methodology. There is increasing pressure to pre-register trials in medicine. Unfortunately, that debate has rarely received the field of nutrition, study registration is rarely done at all in that field.

The point of all this is of course that studies on nutrition aren't much better. While the whole story got a fair amount of praise, there was also a debate about the ethics of such a study. The questions at hand here aren't so simple. Obviously the participants of the studies were misled. However it is not uncommon to mislead participants of studies about the real intent of the research. In psychology a lot of studies would be just impossible to conduct otherwise.

Another point of criticism is that the study wasn't approved by an institutional review board. It'd be an interesting question if an ethics board would've approved a study with the pure intent to show flaws in journalism and the scientific publication process.

My personal opinion is that the ethical issues raised by such a stunt are at best minor compared to the ethical issues with all the supposedly serious studies that get published all the time and have the same flaws.

The only issue I might have with the whole story is that I feel the reality is often even grimmer. I'm pretty sure that with more effort the study could've been published in a real journal. The fallback to an obvious fraud journal was according to Bohannon due to the time constraints of the documentary.

Often enough media stories about health and nutrition (and also about a lot of other things) aren't based on studies at all. It's not rare that these stories are merely based on opinions by single researchers, preliminary lab research or yet unpublished studies.

I don't know if this was the source for the chocolate study idea, but three years ago the British Medical Journal had a publication about the positive effects of the ingredients of dark chocolate. Not only did that trigger a number of media reports, the German Society of Internal Medicine (DGIM) issued a press release seriously proposing that health insurances could cover the costs for dark chocolate for patients with metabolic syndrome. (Here's a talk by Gerd Antes mentioning this issue.)

These things happen on a daily basis, and they don't just happen in nutrition science.

(Image source)

Fraudulent Peer Review

The organization COPE (Committee on Publication Ethics) has issued a statement that indicates attempts to manipulate the peer review process on a large scale.

While not much details are available it indicates that some agencies provide "services" to scientific publishers that include fake peer reviewers. The strategy by these agencies seems to be to submit papers to scientific journals and at the same time trying to propose fake peer reviewers to the same journal in the hope that they'll get to review the submitted article. Then they submit favorable reviews in the name of the non-existing reviewers.

This sounds similar to a story form 2012 I recently also mentioned here where peer reviews for journals from the publisher Elsevier were retracted due to peer reviewers that didn't exist. The current news indicates that this has happened at a much larger scale than previously known.

Probably a large number of publications will be retracted following these incidents.

Press Releases exaggerate Research and Journalists are happy to uncritically repeat the exaggerations

A study published today by the British Medical Journal tries to investigate the often unhealthy relationship between biomedical and health related studies, press releases on the studies and the resulting news articles. There's a widespread feeling among scientifically minded people that “the media gets it wrong”. This is hardly controversial, it's always good to have some scientific data on the details. The study is titled “The association between exaggeration in health related science news and academic press releases: retrospective observational study“ the main authors are Petroc Sumner and Christopher Chambers.

The authors took press releases from 20 major UK universities. They then checked the press release and the resulting news article for typical exaggerations in the field. They took three very common examples: Claiming causation where the study only claims correlation, inference about humans from animal studies and practical advice about behavior change. There is one important limitation the authors point out: They didn't ask whether the studies themselves where already exaggerated, they only tried to measure the exaggerations that go beyond the study itself.

The main results are unsettling, but to be expected: Press releases exaggerate a lot (between 36 % and 40 %). If the press release is exaggerated journalists are much more likely to also exaggerate (around 80 % for all three examples). If the press release does not exaggerate there is still a substantial chance that the journalist will do. Journalists especially like to exaggerate consumer advice.

More exaggeration does not mean more news articles

There is one result that is a bit more difficult to interpret. The authors found that whether or not a press release is exaggerated makes hardly a difference in media uptakes. One has to be careful not to jump to conclusions too fast here and not make the same exaggeration mistakes this whole study is about. This could be interpreted as a sign that science doesn't have to exaggerate in press releases to get media coverage. But another very plausible explanation is that the more interesting studies are less likely to be exaggerated and the less interesting studies are successful in filling that gap by exaggerating their results.

I thought whether a causal relationship could be checked with a different study design. It certainly would be possible to make some kind of randomized controlled trial, though I'm not sure if this would be ethical as you'd have to deliberately produce exaggerated press releases to do so.

Who's to blame

Appart from the data the study already led to some discussion who's to blame and what to do about it. Interestingly both the study itself and an Editorial by Ben Goldacre tend to argue in a direction that scientists are to blame and should change. They both argue that they don't believe in change in journalism (certainly something for me and my colleagues to think about).

Science journalist Ed Yong made a strong statement on Twitter where he argues that all the blame should go to the Journalist. “We are meant to be the bullshit filters. That is our job.” I can't argue with that.

It's certainly interesting that the scientists seem to put the blame on science while the journalist blames his profession. However in the end I think there's neither an excuse for writing exaggerated news articles nor an excuse for exaggerated press releases.

Ben Goldacre has some very practical suggestions how to change science press releases. He argues press releases should contain full names of both the PR people and the scientists involved and responsible for writing them to improve accountability. He also proposes that press releases should be much more integrated in the scientific publishing process. They should be linked from the study itself and they should also be open to post-publication review processes and criticism from the scientific community. I think these are good ideas, though probably not sufficient to tackle the problem. (By the way, here is the press release about this study and it is not linked from the study itself. They could lead by example.)

The Problem with Peer Review

Peer review is often described as one of the cornerstones of good science. The idea is simple: Before a scientific work is published it is reviewed by at least two people from the same field and they decide if it is worth publishing. Peer review is widely seen as the thing that distinguishes science from pseudoscience. However in reality things are not so simple and this simplified view can even be dangerous, because it can give pseudoscience credibility once it managed to slip through the peer review process.

Mailing list paper
This is peer reviewed science
Lately two stories highlighted some of the flaws in the peer review process. The first was a paper that only contained ten pages full with the sentence “Get me off your fucking mailing list”. The paper was created by the computer scientists David Mazières and Eddie Kohler, the Guardian has a story on it. It is actually pretty old, they made it in 2005 and sent it to dubious conferences and journals that flooded their e-mail inbox. But what made the news lately is that the paper actually got accepted by a publication called the International Journal of Advanced Computer Technology (IJACT). Mazières and Kohler didn't pay the publication fees, so the paper wasn't really published, but it should be pretty obvious that no peer review was going on, most likely the replies from the journal were part of some fully automated process.

Fake Open Access Journals

There is a whole bunch of journals out there that are called predatory journals. It is actually a pretty simple form of scam: They create a web page looking like a serious scientific publication and send out mails to researchers asking them to publish their work. Then they charge a small fee for publication. This is widely known, the blog Scholary Open Access lists hundreds of these journals. Sometimes the lack of peer review is blamed on the whole open access publishing model, fueled by the fact that Jeffrey Beall, the author of the Scholary Open Access blog, isn't exactly a friend of open access. Blaming the whole open access model for fake journals seems hardly reasonable to me. (See this blog post from PLoS founder Michael Eisen on the topic, I mostly agree with what he writes.)

The second peer review story that lately came up was a paper where a bracket with the sentence “should we cite the crappy Gabor paper here?” seemed to have slipped through the review of the journal Ethology. RetractionWatch has a story on it. Maybe even more interesting than the fact itself is the explanation given by one of the authors: This was edited in by one of the authors after the peer review. Which opens up quite an interesting question: What do they review when they do peer review? The paper that's finally being published or just some preliminary version that's open to more editing after the peer review?

Many odd stories about peer review

These are just two of the latest examples. There is a large number of odd stories about peer review. In 2012 Retraction Watch reported that different Elsevier journals had reviews from fake reviewers. The reasons for this remained unknown. Also in 2012 a Korean scientist managed to review his own papers. (I wrote a story on that back then / Google translate link.)

When the psychologists Stuart Ritchie, Christopher French and Richard Wiseman failed to replicate a very controversial study by Daryl Bem that claimed to have found signs of precognition they had a hard time finding a publisher. When they submitted it to the British Journal of Psychology it was rejected by one of the reviewers. Later it turned out that this reviewer was no other than Daryl Bem himself, the author of the study they failed to replicate. Finally the study was published in PLoS ONE. (The whole topic of failed replications and the reluctance of journals to publish them is of course a large problem on its own.)

SCIgen paper
My very own SCIgen publication - would it get a peer review?
Earlier this year the scientist Cyril Labbé found out that a large number of papers published by the IEEE and Springer in supposedly peer reviewed conference proceedings were generated with SCIgen. It is unclear why that happened. SCIgen is actually a joke computer program that creates computer science papers that look real, but they contain only gibberish and make no sense. The intent of SCIgen was to make fun of conferences with low submission standards, which the authors successfully demonstrated at the World Multiconference on Systemics, Cybernetics and Informatics (WMSCI) in 2005. If you always wanted to have a scientific paper with your name on it, just go to the SCIgen web page, it'll create it for you. It is also free software, so if you want to have your own gibberish paper generator you can have it. (my own article on SCIgen/Labbé Google Translate)

Publish the Review?

This blog post at Scientific America makes an interesting point discussing the mailing list paper: There's a lack of transparency in peer review. For a reader of a supposedly peer reviewed paper the only thing he knows is that the journal claims it is peer reviewed. They just don't have any proof that the review really happened.

There are a number of proposals how to improve the peer review process. One is a pretty straightforward idea: The reviews themselves could be published. Usually a peer review itself is a lengthy text where the reviewer explains why he thinks a piece of science is worth publishing. Making the reviews public would not only make it much harder to create fake reviews and add transparency, it would also have another obvious benefit: The review itself can be a part of the scientific discourse, containing insights that other researchers might find valuable.

Another thing that is gaining attraction is post publication review. Scientists demand ways to be able to comment on scientific works after they've been published. PubMed, the leading database of medical articles, started PubMed Commons as a way for scientists to share reviews on already published works.

PeerJ is experimenting with new ways of transparent peer review.
Some proponents of Open Science have more radical ideas: Publish everything as soon as possible, review later, even going as far as publishing data when it comes in and publishing the analysis later. This has up- and downsides. The upside is that this makes the scientific process much faster and much more transparent. The obvious downside is that this would remove any distinction between good and bad science the peer review process delivered in the past. However given how bad this works in practice (examples above), I'm inclined to say the advantages will probably outweigh the disadvantages. Also it should be noted that something like this is already happening in many fields where it is common to publish preliminary versions of articles on preprint servers and do some formal, peer-review process or conference presentation months or even years later. Some journals already experiment with new ways of peer review, PeerJ is one of the more prominent examples.

There's a lot of debate about the future of peer review and it is heavily intertwined with other debates about open access publishing and preregistration. One thing should be clear though: While peer review might be an early indicator whether or not something is good science, it is hardly a reliable one.

Welcome to the blog

If you've been following news in the past months you could read that scientific studies found out that pesticides are linked to autism, an intelligent computer passed the so-called Turing test, a new mathematical algorithm will endanger the security of the internet, a higher concentration of antioxidants in organic food makes them healthier, PowerPoint slides make people stupid and apples improve women's sex life.

I have checked some of these claims and ignored others. However, I am quite certain that all of these claims are just plain wrong. You can find these stories almost on a daily basis. There's a never ending flow of bogus science stories in the media. It would be easy to blame the journalists and sensationalist media here. But there are various mechanisms at work here: Scientists exaggerating their research, press releases exaggerating scientific claims and journalists either uncritically reporting them or reporting them in a completely misleading way.

Bogus news stories about scientific results are just the tip of the iceberg. In recent years I became increasingly interested in everything that science gets wrong. This started when I became aware of the problem of publication bias: Many scientific results never get published. I learned that there's a big debate about a reproducibility crisis in science. Often enough scientific results cannot be replicated if other scientists try to do so. The bottom line is that far too many published scientific results are simply wrong and huge amounts of resources are wasted.

While these problems get more attention, some people want to try out radically new ways of doing science. A community that has gathered around the idea of Open Science wants to turn the scientific publication process around and bring much more transparency to science.

Science will never be perfect. Mistakes and preliminary results that later turn out to be wrong are an essential part of science. But many of the problems are fixable.

I find these issues incredibly interesting. I first had thoughts about writing a book about it, but I decided starting a blog would be an easier task. So here it is and hopefully it will present some interesting insights in a debate that is crucial to science.

Before I end this introduction I want to make something clear: I'm not against science. In fact, I love science. It's the only way we have to reliably find out things about the world around us. It is great that these days we know so many weird things about the universe that generations before us couldn't even imagine. Sometimes the flaws of the scientific process are used by the proponents of pseudoscience. However, they offer no better alternative. Proponents of so-called alternative medicine want to replace science with personal experiences, creationists want to replace science with ancient books, others want to replace science with the latest woo woo they found somewhere on the Internet. None of these things offer a meaningful alternative to science. The only way to fix science is better science.