Among other things, I am a twitter subscriber to @RetractionWatch, mostly because I am fascinated by the process of science. It’s curiously very similar in some key ways, to the larger process of software development. But the reason I’m bringing this up today, is because of one tweet that happened to catch my eye tonight as I skimmed the day’s feed:
“To our horror”: Widely reported study suggesting divorce is more likely when wives fall ill gets axed http://t.co/Lx6UGCQtyF
— Retraction Watch (@RetractionWatch) July 21, 2015
I see several of these every day, but I only tend to click through on the ones that strike me as interesting. When an entire study gets “axed”, I’m definitely interested. Turns out, the reason is one I see all too often, in my own line of work:
A widely reported finding that the risk of divorce increases when wives fall ill — but not when men do — is invalid, thanks to a short string of mistaken coding that negates the original conclusions,
That’s quite a lead. An entire study rendered invalid by a single bug in the data analysis application. Now, I’m salivating! The report continues:
Shortly after the paper was published some colleagues from Bowling Green State… were trying to replicate the paper and couldn’t understand why their estimate was so much lower than ours… they pointed out to us, to our horror, that we had miscoded the dependent variable… People who left the study were actually miscoded as getting divorced.
So, is it possible that the authors hadn’t actually tested the code they were using to do their statistical analysis? It seems so. Their colleagues at Bowling Green State reported this:
We are conducting research on gray divorce (couples divorce after age 50) using the Health and Retirement Study, the same data set used in Dr. Karraker’s paper. Her published numbers (32% of the sample got divorced) are very different from our estimates (5%), so we contacted her to clarify the discrepancy.
RetractionWatch actually provides the lines of code that were botched. It’s somewhat difficult out of context, to get a sense of the significance of the difference between the original and corrected code, but clearly the difference in results was massive. The authors defend themselves by pointing out that they, “sent copies of our paper to senior scholars to review and we presented our findings at conferences and workshops. The original manuscript went through multiple rounds of peer-review...”. Since nobody else caught the bug, then, does this mean nobody else tried anything other than the dataset specifically intended for this study? The report doesn’t say, but its hard to imagine, given the replication failure, any other possibility.
An incredulous commenter agrees with me:
I fail to understand how a researcher using a program to analyze data does not have a control set for the variables… My students would never be allowed to apply a program to a real set of data and draw conclusions without demonstrating with a control data-set that their program does what they say it does!
And this is all it really would have taken, as the authors colleagues demonstrated.
But, pointing this out would just be empty sniping, were it not for the real lesson it offers. It is a clear and obvious case of confirmation bias, in my view. The authors set out with a particular outcome in mind: that they would find a statistically significant number of callous husbands viagra generika ohne rezept. They ran their production data through the flawed code, got the results they were hoping for, and left it at that. Their “review” colleagues clearly just accepted the work they did on its face, and rubber-stamped the results.
And, it wasn’t until someone else attempted to use their product to do something productive with it, that the error was exposed. And so, Confirmation Bias scores another victim…