Models, Statistical Significance, Actual Significance

“Sometimes people think that if a coefficient estimate is not significant, then it should be excluded from the model. We disagree. It is fine to have nonsignificant coefficients in a model, as long as they make sense.” Gelman & Hill 2007, page 42

“Include all variables that, for substantive reasons, might be expected to be important.” Ibid, page 69.

When a field adopts a common word and uses it in a technical sense, it’s sometimes lucky and sometimes unlucky.

For example, the computer field was lucky when it chose to use the word “tree” in a technical sense: a hierarchically-arranged structure where leaves are attached to branches, which are joined into larger branches, which eventually join the “root” of the tree. The analogy works out pretty well, except for the “root” part, since real trees don’t have a single root, but rather a root system which is similar to an inverted version of what’s above ground. This technicality notwithstanding, the common meaning of the word leads to a reasonable understanding of its technical use.

By way of contrast, the statistical field was unlucky when it chose to use the word “significant” in “statistically significant”. I’m not sure, but I imagine that the phrase was originally “statistically significantly different from zero (or no effect)”, or something similar. But this got shortened — as all names do — to “statistically significant”, and often to “significant”. Which to most of us means “meaningful” or “important”, and that connotation is so strong that it can lead statistical practitioners, especially casual ones, astray.

The two quotes at the start are from a book by Andrew Gelman, and Jennifer Hill, two well-respected statisticians. It’s easy to get caught up in the glory of models and model-world, where everything is ruled by asterisks in the margins of model summaries and the truth of a model is measured by statistical techniques. Today, we thank Gelman and Hill for reminding us that modeling is a two-way street, where real-world knowledge informs modeling decisions, and model results focus our view of real-world knowledge.

Don’t be influenced by an ill-fated word word choice into confusing model (“statistical significance”) and real-world (“significance”). Don’t give in to The Force of model-world.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s