donderdag 16 februari 2012

Trusting econometrics

One of my profs at Mason told the story of how he'd been offered a new boat if he could get the coefficient in a regression to be below two - which would have allowed a merger to proceed. He turned it down, but not everybody does. Unfortunately, in a whole pile of empirical work, you either have to really trust the guy doing the study, or make sure that his data's available for anybody to run robustness checks, or check that a bunch of people have found kinda the same thing. Degrees of freedom available in setting the specifications can sometimes let you pick your conclusion, like getting a coefficient that hits the right parameter value or the right t-stat.

David Levy and Susan Feigenbaum worried a lot about this in "The technological obsolescence of Scientific Fraud". Where investigators have preferences over outcomes, it's possible to achieve those outcomes through appropriate use of identifying restrictions or method - especially since there are lots of line calls in which techniques to use in different cases. They note that outright fraud makes results non-replicable while biased research winds up instead being fragile - the relationships break down when people change the set of covariates, or the time period, or the technique.

Note that none of this has to come through financial corruption either: simple publish-or-perish incentives are enough where journals are more interested in findings of significant than of insignificant results; DeLong and Lang jumped up and down about this twenty years ago. Ed Leamer made similar points even earlier (recent podcast). And then there's all the work by McCloskey.

Thomas Lumley today points to a nice piece in Psychological Science demonstrating the point.
In this article, we accomplish two things. First, we show that despite empirical psychologists’ nominal endorsement of a low rate of false-positive findings ( ≤ .05), flexibility in data collection, analysis, and reporting dramatically increases actual false-positive rates. In many cases, a researcher is more likely to falsely find evidence that an effect exists than to correctly find evidence that it does not. We present computer simulations and a pair of actual experiments that demonstrate how unacceptably easy it is to accumulate (and report) statistically significant evidence for a false hypothesis. Second, we suggest a simple, low-cost, and straightforwardly effective disclosure-based solution to this problem. The solution involves six concrete requirements for authors and four guidelines for reviewers, all of which impose a minimal burden on the publication process.
Degrees of freedom available to the researcher make it "unacceptably easy to publish "statistically significant" evidence consistent with any hypothesis." They demonstrate it by proving statistically that hearing "When I'm Sixty-Four" rather than a control song made people a year-and-a-half younger.

The lesson isn't radical skepticism of all statistical results, but rather a caution against overreliance on any one finding and an argument for discounting findings coming from folks whose work proves to be fragile.

Geen opmerkingen:

Een reactie posten