Prior Probabilities

There is no question as to the validity of Bayes' Theorem.  However, prior probabilities must be specified in order to use Bayes' Theorem.  This webpage briefly introduces prior probabilities and a discussion of criticism.

Bayesian inference utilizes prior probabilities and data to estimate the posterior probabilities, which are the results.  A prior probability for a parameter is a description of what is known a priori about the parameter to be estimated.  Frequentist inference does not incorporate prior probabilities, nor does it technically produce posterior probabilities.  Therefore Bayesian inference uses a true probability model.

For example, suppose the goal is to estimate the average height of males at a small company that has 100 males, and given a sample size of 5.  The data is collected the day before Christmas in the office, so most people are on vacation.  Prior to collecting the data, the Bayesian statistician could estimate that, given prior office experience, their belief about male height in the office is normally distributed with a mean of 68 inches and a standard deviation of 5 inches.  When the sample of 5 males is collected, it is approximately normally distributed, but has a mean of 71 inches and a standard deviation of 4 inches.  Bayesian inference considers both the statistitician's prior belief (prior probabilities) and the data, and estimates a resulting distribution, the posterior probability distribution.

The obvious criticism is that the statistician's belief is subjective, and may vary from statistician to statistician.  Moreover, it affects the results.  Frequentists maintain that their results are based solely on the data, and therefore frequentist inference is more objective than Bayesian inference.  Read on to see what Bayesian inference is better.

There are times when prior probabilities should be elicited based on a best guess, and times when they should not.  If this was the second office within a larger office building at which the statistician worked, and the statistician had estimated the previous office when all, or virtually all, males were present, and if the current prior probabilities were based on the results of the prior office, then the Bayesian statistician should gain more certainty about the current results than a frequentist who is only able to consider the current 5 males.  How comfortable is a frequentist who estimates a parameter based on a sample size of 5, compared to a Bayesian who estimates the same parameter, but now with a sample size of 105?  If a study is continuously repeated, then the results of prior studies can be accumulated in prior probabilities, yielding a more certain result in the current study.

More often, Bayesians use prior probability distributions that are close to flat, meaning that the prior probabilities have a negligible impact on the posterior probabilities, which are then estimated mostly from the data.  In this case, a Bayesian model and frequentist model of the same form will have virtually identical results, though the results may be used much more flexibly and are easier to interpret when estimated with Bayesian inference.

Beyond this, however, it is also common to use a hierarchical Bayesian model, which includes a predecessor to prior probabilities called hyperpriors.  It is termed hierarchical because there is a structure to the model, such that hyperpriors are used to estimate prior probabilities, which in turn are combined with the data to estimate the posterior probabilities.  The result is an increase in certainty over a simple frequentist model that merely tests a hypothesis by estimating the data, given the hypothesis (see Hypothesis testing).

Both Bayesian inference and frequentist inference have inherent subjective properties.  If they didn't, there would be no need for a statistician, because all analysis could be automated.  The subject of prior probabilities is much more complicated than presented here.  Hopefully this sufficiently demonstrates the usefulness of, and Bayesian preference for, prior probabilities.