The one-sample model is nice and simple but can be limited in utility. More useful are methods that compare responses to different treatment levels.

Two-sample inferences fall into two categories,

  • Choosing between two sets
  • Measuring similarity between two sets

This usually involves testing the “location” of distributions, but we can compare variability

Testing :

Assume that two random samples and are drawn from independent normal distributions

Theorem: Let and be corresponding sample variances and the pooled variance. We have . So is a Student distribution with df.

Theorem: Let and be independent random samples from normal distributions with means and and equal standard deviation . Let

  • Accept if
  • Accept if
  • Accept if either or

An equivalent sample interval for is

When the standard deviations of the samples are not equal, things are more complicated. This is the Behrens-Fisher problem, and no exact solution is known

Theorem: Let and be independent random samples from normal distributions (with separate standard deviations). Let . has approximately a Student distribution with degrees of freedom where and

The justification is a bit complicated, but essentially the weighted average of variances works as a valid approximation

Testing :

Testing equality of variances can be useful. Sometimes variance is the direct measure of interest. Sometimes it is used to justify the exact two-sample test used above.

We use the distribution to test this. The ratio of sample variances will be equal to an distribution assuming the true variances are equal.

Theorem: Let ,

  • Accept if
  • Accept if
  • Accept if

A interval for is

Non-Normal Data

It’s also important to consider non-normal distributions, which can be either continuous or discrete

Suppose Bernoulli trials in treatment yield successes and in treatment yield successes. We’d like to check : versus :

We use the GLRT to come up with a test


MLE with gives,

MLE with individual gives,

Substituting the sample spaces back into gives,

This is hard to work with. It can be shown that has an asymptotic distribution with (so reject if ).

Another more common approach is to appeal to CLT, noting that is approximately normal

Under ,

(maximize with )

Theorem: Let and denote the numbers of successes observed in two independent sets of and Bernoulli trials, where and are the true success probabilities,

Let where ,

  • Accept if
  • Accept if
  • Accept if

A confidence interval for is where