Hypothesis testing aims to choose between a null hypothesis and an alternative hypothesis
A function of the observed data who dictates the outcome of our hypothesis is called a test statistic. The set of values that result in our null hypothesis’ rejection is the critical region, denoted . The points separating from the acceptance region is the critical value.
The probability a test statistic rejects when is true is the level of signifiance, denoted
are common
Theorem: Let be a random sample from a normal distribution with . . Let
- Accept if
- Accept if
- Accept if or
An alternative way of formulating the same idea, the P-value of a test statistic is the probability of getting a value at least as extreme as a given statistic, assuming is true
We can run binomial hypothesis tests much like the test above. When we have a “large” number of samples, we can approximate a group of samples as a normal distribution.
We use the condition to identify large samples. This is true when the range of 3 standard deviations fall into the valid values of .
Theorem: Let be a random sample of Bernoulli random variables, for which . We use and test as above.
If our inequality doesn’t hold, we use the exact binomial distribution
- Accept if
- Accept if
- Accept if
This is equivalent to our previous theorems, but stated terms of the CDF and instead of the critical value and (which is the inverse-CDF of normal)
Type I and II Errors
We already defined the probability of incorrectly rejecting . Call this the Type I error.
the probability of incorrectly accepting , is called the Type II error.
is a function of the presumed value . If is really close to then will be high.
Definition: is the power of a decision test, as a function of the parameter being tested. A power curve graphs this relation.
The power of a test diminishes as . At , .
A steeper power curve is a stronger test
The power of the test is a function of , , and . We can improve our power by either decreasing , decreasing , or increasing .
Example: Say we’d like to test versus with and when . What is the smallest sample size that achieves this objective? Assume a normal distribution with
We solve this by writing a system of equations with the critical value
Nonnormal Decision Rules
Decision rules on general pdfs work much the same
To test , we define a decision rule in terms of a sufficient statistic . The critical region is the set of values of least compatible with and admissible under whose total probability when is true is
Example: for
Suppose we base the decision rule on , the largest order statistic. What’s when ?
Some thought about the uniform distribution makes it clear we are looking for some value
for
The Generalized Likelihood Ratio
What is the best decision rule for choosing between and , and how do we show it is optimal?
Define as the set of unknown parameter values admissible under
Define as the set of all possible values of all unknown parameters
Definition: Let be a random sample from . The generalized likelihood ratio is
Maximizing under (no restrictions), is accomplished by substituting into
Values closer to show that the data is more compatible with
Definition: A generalized likelihood ratio test (GLRT) rejections whenever where ( is expressed as a random variable)
Expressed similarly,
In many situations, is not known, so we must show is a monotonic function of a quantity , where the distribution of is known.
Example:
So say we work with a uniform distribution, and . and
We have
Instead of trying to solve for directly, we take and
for
We conclude that the GLRT calls to reject if
Fancy!