Two-Sample Tests

The one-sample model is nice and simple but can be limited in utility. More useful are methods that compare responses to different treatment levels.

Two-sample inferences fall into two categories,

Choosing between two sets
Measuring similarity between two sets

This usually involves testing the “location” of distributions, but we can compare variability

Testing $H_{0}$ : $μ_{X} = μ_{Y}$

Assume that two random samples $X_{1}, \dots, X_{n}$ and $Y_{1}, \dots, Y_{m}$ are drawn from independent normal distributions

Theorem: Let $S_{X}^{2}$ and $S_{Y}^{2}$ be corresponding sample variances and $S_{p}^{2}$ the pooled variance. We have $S_{p}^{2} = \frac{( n - 1 ) S _{X}^{2} + ( m - 1 ) S _{Y}^{2}}{n + m - 2} = \frac{Σ ( X _{i} - X ˉ ) ^{2} + Σ ( Y _{i} - Y ˉ ) ^{2}}{n + m - 2}$ . So $T_{n + m - 2} = \frac{X ˉ - Y ˉ - ( μ _{X} - μ _{Y} )}{S _{p} \frac{1}{n} + \frac{1}{m}}$ is a Student $t$ distribution with $n + m - 2$ df.

Theorem: Let $x_{1}, \dots, x_{n}$ and $y_{1}, \dots, y_{m}$ be independent random samples from normal distributions with means $μ_{X}$ and $μ_{Y}$ and equal standard deviation $σ$ . Let $t = \frac{x ˉ - y ˉ}{s _{p} \frac{1}{n} + \frac{1}{m}}$

Accept $H_{1} := μ_{X} > μ_{Y}$ if $t \geq t_{α, n + m - 2}$
Accept $H_{1} := μ_{X} < μ_{Y}$ if $t \leq - t_{α, n + m - 2}$
Accept $H_{1} := μ_{X} \neq = μ_{Y}$ if either $t \leq - t_{α /2, n + m - 2}$ or $t \geq t_{α /2, n + m - 2}$

An equivalent $100 (1 - α) %$ sample interval for $μ_{X} - μ_{Y}$ is $(\overset{x}{ˉ} - \overset{y}{ˉ} - t_{α /2, n + m - 2} \cdot s_{p} \frac{1}{n} + \frac{1}{m}, \overset{x}{ˉ} - \overset{y}{ˉ} + t_{α /2, n + m - 2} \cdot s_{p} \frac{1}{n} + \frac{1}{m})$

When the standard deviations of the samples are not equal, things are more complicated. This is the Behrens-Fisher problem, and no exact solution is known

Theorem: Let $X_{1}, \dots, X_{n}$ and $Y_{1}, \dots, Y_{m}$ be independent random samples from normal distributions (with separate standard deviations). Let $W = \frac{X ˉ - Y ˉ - ( μ _{X} - μ _{Y} )}{\frac{S _{X}^{2}}{n} + \frac{S _{Y}^{2}}{m}}$ . $W$ has approximately a Student $t$ distribution with $ν$ degrees of freedom where $ν = \frac{( θ ^ + \frac{n}{m} ) ^{2}}{\frac{1}{( n - 1 )} θ ^ ^{2} + \frac{1}{( m - 1 )} ( \frac{n}{m} ) ^{2}}$ and $\hat{θ} = \frac{s _{X}^{2}}{s _{Y}^{2}}$

The justification is a bit complicated, but essentially the weighted average of variances works as a valid approximation

Testing $H_{0}$ : $σ_{X}^{2} = σ_{Y}^{2}$

Testing equality of variances can be useful. Sometimes variance is the direct measure of interest. Sometimes it is used to justify the exact two-sample test used above.

We use the $F$ distribution to test this. The ratio of sample variances will be equal to an $F$ distribution assuming the true variances are equal.

Theorem: Let $f = \frac{s _{Y}^{2}}{s _{X}^{2}}$ ,

Accept $H_{1} : σ_{X}^{2} > σ_{Y}^{2}$ if $f \leq F_{α, m - 1, n - 1}$
Accept $H_{1} : σ_{X}^{2} < σ_{Y}^{2}$ if $f \geq F_{1 - α, m - 1, n - 1}$
Accept $H_{1} : σ_{X}^{2} \neq = σ_{Y}^{2}$ if $f \neq \in [F_{α /2, m - 1, n - 1}, F_{1 - α /2, m - 1, n - 1}]$

A $100 (1 - α) %$ interval for $σ_{X}^{2} / σ_{Y}^{2}$ is $(f \cdot F_{α /2, m - 1, n - 2}, f \cdot F_{1 - α /2, m - 1, n - 1})$

Non-Normal Data

It’s also important to consider non-normal distributions, which can be either continuous or discrete

Suppose $n$ Bernoulli trials in treatment $X$ yield $x$ successes and $m$ in treatment $Y$ yield $y$ successes. We’d like to check $H_{0}$ : $p_{X} = p_{Y}$ versus $H_{1}$ : $p_{X} \neq = p_{Y}$

We use the GLRT to come up with a test

$ω = {(p_{X}, p_{Y}) : 0 \leq p_{X} = p_{Y} \leq 1}$
$Ω = {(p_{X}, p_{Y}) : 0 \leq p_{X} \leq 1, 0 \leq p_{Y} \leq 1}$

$L = p_{X}^{x} (1 - p_{X})^{n - x} \cdot p_{Y}^{y} (1 - p_{Y})^{m - y}$

MLE with $p_{X} = p_{Y}$ gives,
$p_{e} = \frac{x + y}{n + m}$

MLE with individual $p_{X}, p_{Y}$ gives,
$p_{X_{e}} = \frac{x}{n}$
$p_{Y_{e}} = \frac{y}{m}$

Substituting the sample spaces back into $L$ gives,
$λ = \frac{L ( ω _{e} )}{L ( Ω _{e} )} = \frac{[( x + y ) / ( n + m ) ] ^{x + y} [ 1 - ( x + y ) / ( n + m ) ] ^{n + m - x - y}}{( x / n ) ^{x} [ 1 - ( x / n ) ] ^{n - x} ( y / m ) ^{y} [ 1 - ( y / m ) ] ^{m - y}}$

This is hard to work with. It can be shown that $- 2 ln λ$ has an asymptotic $χ^{2}$ distribution with $df = 1$ (so reject $H_{0}$ if $- 2 ln λ \geq 3.84$ ).

Another more common approach is to appeal to CLT, noting that $\frac{\frac{X}{n} - \frac{Y}{m} - E ( \frac{X}{n} - \frac{Y}{m} )}{Var ( \frac{X}{n} - \frac{Y}{m} )}$ is approximately normal

Under $H_{0}$ ,
$E (\frac{X}{n} - \frac{Y}{m}) = 0$
$Var (\frac{X}{n} - \frac{Y}{m}) = \frac{( n + m ) p ( 1 - p )}{nm}$ (maximize with $p = p_{e}$ )

Theorem: Let $x$ and $y$ denote the numbers of successes observed in two independent sets of $n$ and $m$ Bernoulli trials, where $p_{X}$ and $p_{Y}$ are the true success probabilities,

Let $z = \frac{\frac{x}{n} - \frac{y}{m}}{\frac{p _{e} ( 1 - p _{e} )}{n} + \frac{p _{e} ( 1 - p _{e} )}{m}}$ where $p_{e} = \frac{x + y}{n + m}$ ,

Accept $H_{1} := p_{X} > p_{Y}$ if $z \geq z_{α}$
Accept $H_{1} := p_{X} < p_{Y}$ if $z \leq - z_{α}$
Accept $H_{1} := p_{X} \neq = p_{Y}$ if $z \neq \in [- z_{α /2}, z_{α /2}]$

A $100 (1 - α) %$ confidence interval for $p_{X} - p_{Y}$ is $(\frac{x}{n} - \frac{y}{m} - Z, \frac{x}{n} - \frac{y}{m} + Z)$ where $Z = z_{α /2} \frac{( \frac{x}{n} ) ( 1 - \frac{x}{n} )}{n} + \frac{( \frac{y}{m} ) ( 1 - \frac{y}{m} )}{m}$

Binyamin's Notes

Explorer

Testing $H_{0}$ : $μ_{X} = μ_{Y}$

Testing $H_{0}$ : $σ_{X}^{2} = σ_{Y}^{2}$

Non-Normal Data

Table of Contents

Binyamin's Notes

Explorer

Two-Sample Tests

Testing H0​: μX​=μY​

Testing H0​: σX2​=σY2​

Non-Normal Data

Table of Contents

Testing $H_{0}$ : $μ_{X} = μ_{Y}$

Testing $H_{0}$ : $σ_{X}^{2} = σ_{Y}^{2}$