Discussion about this post

User's avatar
Sarah Hamersma's avatar

Amazing. Two things:

1. I was fascinated by your p-hacking piece and wanted to get back to finish it and try to get my head around it. And the confirmation bias was strong with me! "It's the training data." - yep, it's going to double down on our mistakes. For what it's worth, I'm still confident that AI will improperly interpret statistical significance because of all the bad training data out there (ex. The findings indicate "no effect", etc.). Maybe I should try to analyze that somehow. ANYWAY, the piece today is fantastic and such a great reminder about how we understand and practice precision.

2. This is coincidentally very related to something I'm working on with some coauthors. One of our outcomes has a fairly low mean-y, and the effect size is very small. At the bottom of each column in the table we report the approximate percent change from baseline, i.e. coefficient over mean-y. In general I like this statistic. However, we are in a debate about whether to report the percent change calculated in the software, which is not rounding these two numbers, or the percent change you get when you use the reported (rounded) numbers. We get whole numbers for many of them when we do this, as we are rounding to 3 decimal places and the effect size is usually less than 0.005. The reason for this is the same as what you were addressing, just less consequential because it's not the t-stat, just more of a contextual statistic. That said, I'm curious what you would do. ( I won't state my preference as I don't want to bias you.)

Bonus:

3. Very related: why don't we use the concept of rounding to significant digits instead of decimal places in econ tables? In science it's very clear that if numbers are smaller, you adjust the decimal places to get to a comparable reporting of precision. I have done this before, having more decimal places for a column with smaller effect sizes, but it seems to rock the boat.

Alexis J. Diamond's avatar

Another great post. Very thought-provoking. I never thought about this rounding error-mode before.

I also really like your simulated empirical example. I'm wondering why you chose to compare a histogram to a smoothed density plot, in your figure, when the key distinction (I think) is that left-side results are rounded when right-side results are not). i.e., by comparing different plots, someone (not me--I'm fully persuaded by your argument) might wonder if the fact that right and left sides look really different is really due to the rounding/not-rounding, or due to the tuning parameter chosen for two different data-viz designs (each design being quite sensitive to tuning choice). I'm just wondering, if you were to compare two histograms with the same number of breaks, or two density plots with the same bw, is the visual impact much the same?

18 more comments...

No posts

Ready for more?