Bayesian statistics provides formal methods for incorporating prior knowledge into the estimation of unknown parameters

In a non-Bayesian analysis, unknown parameters are treated as constants, while a Bayesian analysis treats them as random variables with a pdf

At the outset of an analysis the pdf assigned to a parameter is based on little information and is referred to as the prior distribution. After this pdf is updated with information, it is referred to as a posterior distribution

Definition: Let be a statistic dependent on a parameter with . Assume that is the value of a random variable with prior distribution denoted . The posterior distribution of , given that , is

The power of this is in its wide applicability

Now how do we use to actually calculate a point estimator ? One technique is to find a maximum likelihood estimator, by differentiating the posterior distribution

However, the classic Bayesian approach is to minimize the risk associated with

Definition: Let be an estimator for based on a statistic . The loss function associated with is denoted , where and

Definition: The risk associated with is the expected value of the loss function with respect to the posterior distribution of ,

A Bayes estimate minimizes risk, which looks like differentiating the risk function

However, for L1 and L2 norm, there’s an easier method

Theorem: Let be the posterior distribution for the unknown parameter

  • If then the Bayes estimate is the median of
  • If then the Bayes estimate is the mean of