The intrinsic unreliability of science

More and more investigations of quasi-scientific shenanigans are demonstrating the need for more precision in the language used to describe the field that is too broadly and misleadingly known as “science”:

The problem with ­science is that so much of it simply isn’t. Last summer, the Open Science Collaboration announced that it had tried to replicate one hundred published psychology experiments sampled from three of the most prestigious journals in the field. Scientific claims rest on the idea that experiments repeated under nearly identical conditions ought to yield approximately the same results, but until very recently, very few had bothered to check in a systematic way whether this was actually the case. The OSC was the biggest attempt yet to check a field’s results, and the most shocking. In many cases, they had used original experimental materials, and sometimes even performed the experiments under the guidance of the original researchers. Of the studies that had originally reported positive results, an astonishing 65 percent failed to show statistical significance on replication, and many of the remainder showed greatly reduced effect sizes.

Their findings made the news, and quickly became a club with which to bash the social sciences. But the problem isn’t just with psychology. There’s an ­unspoken rule in the pharmaceutical industry that half of all academic biomedical research will ultimately prove false, and in 2011 a group of researchers at Bayer decided to test it. Looking at sixty-seven recent drug discovery projects based on preclinical cancer biology research, they found that in more than 75 percent of cases the published data did not match up with their in-house attempts to replicate. These were not studies published in fly-by-night oncology journals, but blockbuster research featured in Science, Nature, Cell, and the like. The Bayer researchers were drowning in bad studies, and it was to this, in part, that they attributed the mysteriously declining yields of drug pipelines. Perhaps so many of these new drugs fail to have an effect because the basic research on which their development was based isn’t valid….

Paradoxically, the situation is actually made worse by the
fact that a promising connection is often studied by several
independent teams. To see why, suppose that three groups of researchers
are studying a phenomenon, and when all the data are analyzed, one group
announces that it has discovered a connection, but the other two find
nothing of note. Assuming that all the tests involved have a high
statistical power, the lone positive finding is almost certainly the
spurious one. However, when it comes time to report these findings, what
happens? The teams that found a negative result may not even bother to
write up their non-discovery. After all, a report that a fanciful
connection probably isn’t true is not the stuff of which scientific
prizes, grant money, and tenure decisions are made.
And even if they did write it up, it probably wouldn’t be
accepted for publication. Journals are in competition with one another
for attention and “impact factor,” and are always more eager to report a
new, exciting finding than a killjoy failure to find an association. In
fact, both of these effects can be quantified. Since the majority of
all investigated hypotheses are false, if positive and negative evidence
were written up and accepted for publication in equal proportions, then
the majority of articles in scientific journals should report no
findings. When tallies are actually made, though, the precise opposite
turns out to be true: Nearly every published scientific article reports
the presence of an association. There must be massive bias at work. 
Ioannidis’s argument would be potent even if all
scientists were angels motivated by the best of intentions, but when the
human element is considered, the picture becomes truly dismal.
Scientists have long been aware of something euphemistically called the
“experimenter effect”: the curious fact that when a phenomenon is
investigated by a researcher who happens to believe in the phenomenon,
it is far more likely to be detected. Much of the effect can likely be
explained by researchers unconsciously giving hints or suggestions to
their human or animal subjects, perhaps in something as subtle as body
language or tone of voice. Even those with the best of intentions have
been caught fudging measurements, or making small errors in rounding or
in statistical analysis that happen to give a more favorable result.
Very often, this is just the result of an honest statistical error that
leads to a desirable outcome, and therefore it isn’t checked as
deliberately as it might have been had it pointed in the opposite

But, and there is no putting it nicely, deliberate fraud
is far more widespread than the scientific establishment is generally
willing to admit.

Never confuse either scientistry or sciensophy for scientody. To paraphrase, and reject, Daniel Dennett’s contention, do not trust biologists or sociologists or climatologists, or anyone else who calls himself a scientist, simply because physicists get amazingly accurate results.