In the first posting of this series, I simply applied Bayes Rule repeatedly: Posterior Prior
Likelihood. I didn’t have to know anything about conjugate priors, hyperparameters, sufficient statistics, parametric forms, or anything beyond the basics. I got a reasonable posterior for
and used that to find the correct answer. Why go beyond this?
Well, first, there’s really no good way for me to communicate my distribution to anyone else. It’s a (long) vector of values, and that’s the weakness of a non-parametric system: there is no well-known function and no sufficient statistics to easily describe what I’ve discovered. (Of course, it’s possible that there is no well-know function that’s appropriate, in which case the simulation method is actually the only option.)
Second, my posterior density is discrete. This works reasonably well for the exercise I attempted, but it’s still a discrete approximation which is less precise and can suffer from simulation-related issues (number of samples, etc) that have nothing to do with my proposed model.
Third, if an analytical method can be used, it may be possible to directly calculate a final posterior without repeated applications of Bayes Rule. As I mentioned in the previous posting, Gelman got an answer of Gamma(238, 10) analytically, not through approximations and simulation. If we look in Wikipedia, we can find that the conjugate prior for , the parameter of a Poisson distribution, is the Gamma (
) distribution, and given our series of
accident counts and an initial (posterior)
and
, the posterior density is Gamma (
)
Thus, starting with priors for the hyperameters () of (0, 0), we can directly calculate the final posterior as Gamma (238, 10). Nice! Especially if you’re not using a computer. But if you’re using a computer, you might still be tempted to use simulation — it only takes a second or two to do all of the simulation I did in my first posting, after all.
Let’s examine the steps of using Bayes Rule a little more closely. The prior doesn’t need to have any special distribution. In fact, it should reflect any knowledge we might have regarding the problem at hand. The likelihood, however, will have a distribution specific to the model we are using: in this exercise it was Poisson which is appropriate for the number of unrelated accidents in a year.
Multiplying these two, point-by-point on my sampled grid creates a posterior which is a compromise between a parametric and an arbitrary distribution, and this semi-arbitrary posterior is then used as the prior for the next datum. When iterated over the data, the likelihood influences the posterior more and more, of course. In the end, as we showed in the previous posting, the result is very close to the parametric Gamma (238, 10) that Gelman found. We could use various methods to estimate the parameters and
from our gridded posterior and in the end have a more compact answer, but we could also directly calculate an answer saving us from various simulation issues such as loss of numerical precision.
Of course, this depends on the Gamma distribution being a reasonable distribution for the problem at hand. It’s one more part of our model, and models are only approximations, but they should be principled and reasonably-accurate approximations at the end of the day. Next time, let’s consider some simulation-related issues.