Psychology Archipelago

Why the "Replication Crisis" Isn't so bad

There has supposedly been a crisis going on in science. Dubbed, the “replication crisis,” there seems to be a problem with trying to replicate past scientific results. This has been making big news, particularly in psychology and other social sciences. Over the past several years, Vox has published several articles, with headlines proclaiming the devastation caused by replication crisis. But what is it, and how big of a deal is it? 

The replication crisis is the idea that too many studies are originally published cannot be repeated and give the same results. When most people think of scientific papers, they think that they are set in stone facts. However, science is a process, which requires confirming previous results. 

Studies can be repeated in several ways. There is “exact replication”, which is where you try to copy the methods of the original study as closely as possible. There is also “conceptual replication,” where a scientist tries to test the same hypothesis using a different method. They might use different tests to measure the same conceptual idea. Another way to confirm a previous experiment’s result is to rerun analysis and computation using the same data.

A now famous study analyzed the replication of 100 studies in well-respected psychology journals, showing that roughly one-third to one-half of the original findings are replicable. In fact, it seems to be an issue in most of the sciences, except for perhaps mathematics. 

There are several reasons why there many studies are not repeatable. 

It can come down to low statistical power. Many studies do not have as many participants as they want due to lack of funding. This leads to results that are less likely to reflect reality. For example, if I were to ask 5 random people if they liked chocolate, all of them might say no. I might conclude from this that most, if not all, people do not like chocolate. This would, however, not reflect reality, since most people do like chocolate. If I had more people, the result would be more likely to reflect the reality that most people like chocolate. If I surveyed 10,000 people randomly, it is very unlikely that all of them would dislike chocolate. 
But the underlining reason for much of the replication crisis is that there is a pressure to publish new, bold research constantly. According to a survey of over 1500 researchers, over 90% found this pressure to publish a contributing factor in the replication crisis. Research with surprising results gets headlines, funding, and advances careers. The pressure to publish new research on a consistent basis can lead to people being spread to thin, leading to poorer experiment design and analysis.

There are, however, more nefarious outcomes from the pressure to publish. P-hacking, whether intentional or not, is one way to get an eye-catching headline. P-hacking happens “when researchers collect or select data or statistical analyses until nonsignificant results become significant.” This can happen by taking data from a bunch of different sources to get a correlation. If I record a bunch of random categories such as amount of sleep, books read per year, heart rate, or the time I brush my teeth, I will eventually find some sort of correlation with one of those categories and, say, the lunar eclipse. You might see a headline soon about how the moon is causing you to brush your teeth less. 

It can also be done by stopping the experiment when I get the significant result I want. For example, if I flip a coin 100 times, there might be a point where I get significantly more heads than tails. Say at flip 11, I have 9 heads and 2 tails. I might stop here and say that the coin lands heads more than tails. But if I had did the coin flip 100 times, I would have gotten something closer to fifty-fifty.
P-hacking might not as obviously unethical as cyber hacking and is not explicitly lying about the data, but it goes against the scientific method, and poses a huge threat to the credibility of the research.
Finally, there are, although rare, also instances of faking results. Diederik Stapel, for example, was a dutch psychologist who was suspended from Tilburg University for faking results. Psychology is not the only field. Hendrik Schön was a German physicist who was supposedly pioneering the field of semiconductors using organic materials, before being found out to be fraudulent and even having his PhD revoked.

In addition to the factors make reliable research hard, replication itself isn't easy for several reasons. 

Exact replication might not be possible. 

Sometimes, key elements of the method are not clear or are not included. In these cases, exact replication is not possible. When exact replication is not possible, the inability to reproduce results is often blamed on the fact that the method is not right, rather than the conclusion being wrong. 

Another reason is that data about the participants is often not complete, leaving out relevant details such as age, sex, chronic illnesses, and medical conditions. There may be confounding factors that prevent others from getting the same result as the original study.

Finally, sometimes the same experiment cannot be performed due to changing circumstances. If I measured the personality of people in the 50s, there is no way to replicate the experiment in the modern age (unless I make a time machine). Another example is the famous Stanford prison experiment, in which people were placed in roles of either “guard” or “prisoner" to see how a position of authority affects human behavior. This resulted in increasingly brutal abuse of the prisoners by the guards, resulted in what became an extremely unethical experiment. With more ethical requirements in place, these kind of experiments can (luckily) no longer be repeated. 

But we might not have to be so worried about the replication crisis. Bold research is always going to be risky. A 2020 paper published in Nature used simulations and came to the conclusion that the method of publishing more risky papers and replicating it later was more efficient when compared to only publishing after replication. Perhaps the pressure to publish new theories is what has led to the modern era and the rapid amount of new discoveries.

We still, of course, want to have the best of both worlds, increasing replicability and maintaining the culture of proposing bold new theories. 

There are several solutions to this issue. 

Preregistration, the process of defining research questions and analysis beforehand, is starting to become more popular. This will help with p-hacking and prevent researchers from making post hoc hypotheses (after doing the experiment) and presenting them as a priori (made with no knowledge of the results).

The Replication Index is a new tool created by Ulrich Schimmack to understand replicability of studies, or even researchers themselves. It gives a numerical value to understand the credibility of research, giving an easy way to know which studies are more reliable.

There also seems to be a move towards making the data necessary to replicate studies more available, with journals such as PLOS ONE implementing this as a policy in 2014. 

The most important thing we have to change, however, is our view of science, particularly of the social sciences. Although we view science as an accumulation of knowledge and facts, psychology and other less developed sciences are different. Psychology goes through fads, a cycle where there is a lot of enthusiasm about a new theory, people try to apply it in several domains, new negative data comes out, there is confusion over inconsistent and contradictory results, people make ad hoc excuses, and finally people lose interest and move on to new theories. 

Psychology isn't a clean road to knowledge. Our brain is one of the most complex things in the universe. There were, and will continue to be a lot of U-turns and dead ends in trying to understand it.

So next time you see a headline talking about the moon and brushing teeth, take it with a grain of salt, knowing that science is always evolving.