Hypothesis testing aims to choose between a null hypothesis and an alternative hypothesis

A function of the observed data who dictates the outcome of our hypothesis is called a test statistic. The set of values that result in our null hypothesis’ rejection is the critical region, denoted . The points separating from the acceptance region is the critical value.

The probability a test statistic rejects when is true is the level of signifiance, denoted

are common

Theorem: Let be a random sample from a normal distribution with . . Let

  • Accept if
  • Accept if
  • Accept if or

An alternative way of formulating the same idea, the P-value of a test statistic is the probability of getting a value at least as extreme as a given statistic, assuming is true

We can run binomial hypothesis tests much like the test above. When we have a “large” number of samples, we can approximate a group of samples as a normal distribution.

We use the condition to identify large samples. This is true when the range of 3 standard deviations fall into the valid values of .

Theorem: Let be a random sample of Bernoulli random variables, for which . We use and test as above.

If our inequality doesn’t hold, we use the exact binomial distribution

  • Accept if
  • Accept if
  • Accept if

This is equivalent to our previous theorems, but stated terms of the CDF and instead of the critical value and (which is the inverse-CDF of normal)

Type I and II Errors

We already defined the probability of incorrectly rejecting . Call this the Type I error.

the probability of incorrectly accepting , is called the Type II error.

is a function of the presumed value . If is really close to then will be high.

Definition: is the power of a decision test, as a function of the parameter being tested. A power curve graphs this relation.

The power of a test diminishes as . At , .

A steeper power curve is a stronger test

The power of the test is a function of , , and . We can improve our power by either decreasing , decreasing , or increasing .

Example: Say we’d like to test versus with and when . What is the smallest sample size that achieves this objective? Assume a normal distribution with

We solve this by writing a system of equations with the critical value

Nonnormal Decision Rules

Decision rules on general pdfs work much the same

To test , we define a decision rule in terms of a sufficient statistic . The critical region is the set of values of least compatible with and admissible under whose total probability when is true is

Example: for



Suppose we base the decision rule on , the largest order statistic. What’s when ?

Some thought about the uniform distribution makes it clear we are looking for some value

for

The Generalized Likelihood Ratio

What is the best decision rule for choosing between and , and how do we show it is optimal?

Define as the set of unknown parameter values admissible under
Define as the set of all possible values of all unknown parameters

Definition: Let be a random sample from . The generalized likelihood ratio is

Maximizing under (no restrictions), is accomplished by substituting into

Values closer to show that the data is more compatible with

Definition: A generalized likelihood ratio test (GLRT) rejections whenever where ( is expressed as a random variable)

Expressed similarly,
In many situations, is not known, so we must show is a monotonic function of a quantity , where the distribution of is known.

Example:
So say we work with a uniform distribution, and . and

We have


Instead of trying to solve for directly, we take and

for

We conclude that the GLRT calls to reject if
Fancy!