Simple Linear Regression

Given a set of $n$ points $(x_{i}, y_{i})$ and a positive integer $m$ , which polynomial of degree $m$ is “closest” to the given points?

$p (x) = a + \sum_{i = 1}^{m} b_{i} x^{i}$

Definition: The method of least squares finds the coefficients of $p$ minimizing $L = \sum_{i = 1}^{n} [y_{i} - p (x_{i})]^{2}$

Theorem: The line $y = a + b x$ minimizing $L$ has slope $b = \frac{n \sum _{i = 1}^{n} x _{i} y _{i} - ( \sum _{i = 1}^{n} x _{i} ) ( \sum _{i = 1}^{n} y _{i} )}{n ( \sum _{i = 1}^{n} x _{i}^{2} ) - ( \sum _{i = 1}^{n} x _{i} ) ^{2}}$ and y-intercept $a = \frac{\sum _{i = 1}^{n} y _{i} - b \sum _{i = 1}^{n} x _{i}}{n} = \overset{y}{ˉ} - b \overset{x}{ˉ}$

This is solved by taking partial derivatives on $L$

Definition: $\overset{y}{^} = a + b x$ is known as the predicted value of $y$
Definition: $y_{i} - \overset{y}{ˉ}$ is the ith residual

Residual plots graph residuals over $x$

If a residual plot shows no patterns or trends, the relationship is taken to be linear

Our theorem can also work on more complicated relationships, as long as they can be linearized

Exponential regression: $y = a e^{b x}$ $⟹$ $ln y = ln a + b x$
Logarithmic regression: $y = a x^{b}$ $⟹$ $lo g y = lo g a + b lo g x$
Logistic regression: $y = \frac{L}{1 + e ^{a + b x}} ⟹ ln (\frac{L - y}{y}) = a + b x$

There are many other curvilinear models

The Linear Model

It’s more realistic to think of values as sampling from a random variable $Y$

Definition: Let $f_{Y ∣ x} (y)$ denote the pdf of the random variable $Y$ for a given value $x$ . $y = E (Y ∣ x)$ is the regression curve of $Y$ on $x$

Definition: The simple linear model makes four assumptions:

$f_{Y ∣ x} (y)$ is a normal pdf for all $x$
$σ$ is the same for all $x$
$y = E (Y ∣ x) = β_{0} + β_{1} x$
Each conditional distribution is independent

Theorem: Let $(x_{1}, Y_{1}), (x_{2}, Y_{2}), \dots,$ and $(x_{n}, Y_{n})$ be a set of points satisfying the simple linear model, $E (Y ∣ x) = β_{0} + β_{1} x$ . The MLE estimators for $β_{0}, β_{1},$ and $σ^{2}$ are given by $\hat{β}_{1} = \frac{n \sum _{i = 1}^{n} x _{i} Y _{i} - ( \sum _{i = 1}^{n} x _{i} ) ( \sum _{i = 1}^{n} Y _{i} )}{n ( \sum _{i = 1}^{n} x _{i}^{2} ) - ( \sum _{i = 1}^{n} x _{i} ) ^{2}}$ , $\hat{β}_{0} = \overset{ˉ}{Y} - \hat{β}_{1} \overset{x}{ˉ}$ , and $\overset{σ}{^}^{2} = \frac{1}{n} \sum_{i = 1}^{n} (Y_{i} - \hat{Y}_{i})^{2}$ where $\hat{Y}_{i} = \hat{β}_{0} + \hat{β}_{1} x_{i}$ .

Theorem: Let $\hat{β}_{0}, \hat{β}_{1},$ and $\overset{σ}{^}^{2}$ be the MLE estimators for $β_{0}, β_{1},$ and $σ^{2}$ , respectively.

$\hat{β}_{0}$ and $\hat{β}_{1}$ are normally distributed
$\hat{β}_{0}$ and $\hat{β}_{1}$ are unbiased, $E (\hat{β}_{0}) = β_{0}$ and $E (\hat{β}_{1}) = β_{1}$
$Var (\hat{β}_{1}) = \frac{σ ^{2}}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$
$Var (\hat{β}_{0}) = \frac{σ ^{2} \sum _{i = 1}^{n} x _{i}^{2}}{n \sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$

Theorem:

$\hat{β}_{1}$ , $\overset{ˉ}{Y}$ , and $\overset{σ}{^}^{2}$ are mutually independent
$\frac{n σ ^ ^{2}}{σ ^{2}}$ has a chi square distribution with $n - 2$ degrees of freedom

The unbiased estimator for $σ^{2}$ is $S^{2} = \frac{1}{n - 2} \sum_{i = 1}^{n} (Y_{i} - \hat{β}_{0} - \hat{β}_{1} x_{i})^{2}$

Theorem: $T_{n - 2} = \frac{β ^ _{1} - β _{1}}{S / \sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$ has Student $t$ distribution with $n - 2$ degrees of freedom. Let $t = \frac{β ^ _{1} - β _{1}^{'}}{s / \sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$ ,

$H_{1}$ : $β_{1} > β_{1}^{'}$ if $t \geq t_{α, n - 2}$
$H_{1} : β_{1} < β_{1}^{'}$ if $t \leq - t_{α, n - 2}$
$H_{1} : β_{1} \neq = β_{1}^{'}$ if $t \neq \in (- t_{α /2, n - 2}, t_{α /2, n - 2})$

Let $w = t_{α /2, n - 2} \cdot \frac{s}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$ ,
$[\hat{β}_{1} - x, \hat{β}_{1} + x]$ is a $100 (1 - α) %$ confidence interval for $β_{1}$

One useful test is checking for $H_{0} : β_{1} = 0$ , which says whether $E (Y)$ changes with $x$ or not

$β_{0}$ is generally less interesting, but also has similar well-defined tests. Let $w = t_{α /2, n - 2} \cdot \frac{s \sum _{i = 1}^{n} x _{i}^{2}}{n \sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$ ,
$[\hat{β}_{0} - x, \hat{β}_{0} + x]$ is a $100 (1 - α) %$ confidence interval for $β_{0}$

Likewise, $[\frac{( n - 2 ) s ^{2}}{χ _{1 - α /2, n - 2}^{2}}, \frac{( n - 2 ) ^{2}}{χ _{α /2, n - 2}^{2}}]$ is a $100 (1 - α) %$ confidence interval for $σ^{2}$
This is almost exactly the same as the regular test for variance, but with one less degree of freedom, since each estimated parameter essentially consumes a degree of freedom.

Theorem: A $100 (1 - α) %$ confidence interval for $E (Y ∣ x) = β_{0} + β_{1} x$ is given by $(\overset{y}{^} - w, \overset{y}{^} + w)$ , where $w = t_{α /2, n - 2} \cdot s \frac{1}{n} + \frac{( x - x ˉ ) ^{2}}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$ and $\overset{y}{^} = \hat{β}_{0} + \hat{β}_{1} x$ .

This theorem is nifty but we might want a confidence interval for the actual range $Y$ , not the expected range $E (Y ∣ x)$ . We can derive this with the random variable $\hat{Y} - Y$ .

Theorem: A $100 (1 - α) %$ prediction interval for $Y$ at the fixed value $x$ is given by $(\overset{y}{^} - w, \overset{y}{^} + w)$ , where $w = t_{α /2, n - 2} \cdot s 1 + \frac{1}{n} + \frac{( x - x ˉ ) ^{2}}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}}$ and $\overset{y}{^} = \hat{β}_{0} + \hat{β}_{1} x$ .

We can also devise a test to check the relationship between two independent regressions, where $H_{0} : β_{1} = β_{1}^{*}$

Theorem: $T = \frac{β ^ _{1} - β ^ _{1}^{*} - ( β _{1} - β _{1}^{*} )}{S \frac{1}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}} + \frac{1}{\sum _{i = 1}^{m} ( x _{i}^{*} - x ˉ ^{*} ) ^{2}}}$ where $S = \frac{\sum _{i = 1}^{n} [ Y _{i} - ( β ^ _{0} + β ^ _{1} x _{i} ) ] ^{2} + \sum _{i = 1}^{m} [ Y _{i}^{*} - ( β ^ _{0}^{*} + β ^ _{1}^{*} x _{i}^{*} ) ] ^{2}}{n + m - 4}$ is a Student $t$ distribution with $n + m - 4$ degrees of freedom. Let $t = \frac{β ^ _{1} - β ^ _{1}^{*}}{s \frac{1}{\sum _{i = 1}^{n} ( x _{i} - x ˉ ) ^{2}} + \frac{1}{\sum _{i = 1}^{m} ( x _{i}^{*} - x ˉ ^{*} ) ^{2}}}$ , define decision tests in the usual way ( $H_{1} : β_{1} \neq = β_{1}^{*}$ if $t \neq \in (- t_{α /2, n + m - 4}, t_{α /2, n + m - 4})$ )

Covariance and Correlation

The current discussion was concerned with regression data. A more complicated situation is when measurements are of the form $(X_{i}, Y_{i})$ , where both $X$ and $Y$ are random but correlated variables.

Covariance is an important tool in measuring this kind of relation, but reflects the units of the individual variables

Definition: The correlation coefficient of $X$ and $Y$ is denoted $ρ (X, Y) = \frac{Cov ( X , Y )}{σ _{X} σ _{Y}} = Cov (X^{*}, Y^{*})$ where $X^{*} = (X - μ_{X}) / σ_{X}$ and $Y^{*} = (Y - μ_{Y}) / σ_{Y}$

Theorem: For any $X$ and $Y$ ,

$∣ ρ (X, Y) ∣ \leq 1$
$∣ ρ (X, Y) ∣ = 1$ $⟺$ $Y = a X + b$ for some constants $a$ and $b$

Can we estimate $ρ (X, Y)$ ? One useful identity is $ρ (X, Y) = \frac{E ( X Y ) - E ( X ) E ( Y )}{Var ( X ) Var ( Y )}$

We can use this to define the sample correlation coefficient, $R = \frac{\frac{1}{n} \sum _{i = 1}^{n} X _{i} Y _{i} - X ˉ Y ˉ}{\frac{1}{n} \sum _{i = 1}^{n} ( X _{i} - X ˉ ) ^{2} \frac{1}{n} \sum _{i = 1}^{n} ( Y _{i} - Y ˉ ) ^{2}}$

This is also known as the Pearson product-moment correlation coefficient (in honor of Karl Pearson)

We can use the square of the sample correlation coefficient and derive $r^{2} = \frac{\sum _{i = 1}^{n} ( y _{i} - y ˉ ) ^{2} - \sum _{i = 1}^{n} ( y _{i} - β ^ _{0} - β ^ _{1} x _{i} ) ^{2}}{\sum _{i = 1}^{n} ( y _{i} - y ˉ ) ^{2}} = \frac{1 - 2}{1}$

Represents the total variability in the dependent variable
Represents the variation in the $y_{i}$ ‘s not accounted for by the linear regression with $x$

Therefore, $r^{2}$ is the proportion of the total variation in the $y_{i}$ ‘s that can be attributed to the linear relationship with $x$ , sometimes called the coefficient of determination

The Bivariate Normal Distribution

After looking at relationships between two random variables, we start to wonder, can we generalize the normal distribution to higher dimensions?

Jumping to the definition…

Definition: $f_{X, Y} (x, y) = \frac{1}{2 π σ _{X} σ _{Y} 1 - ρ ^{2}} \cdot exp {- \frac{1}{2} (\frac{1}{1 - ρ ^{2}}) [\frac{( x - μ _{X} ) ^{2}}{σ _{X}^{2}} - 2 ρ \frac{x - μ _{X}}{σ _{X}} \cdot \frac{y - μ _{Y}}{σ _{Y}} + \frac{( y - μ _{Y} ) ^{2}}{σ _{Y}^{2}}]}$ is a bivariate normal distribution with parameters $μ_{X}, σ_{X}^{2}, μ_{Y}, σ_{Y}^{2}$ , and $ρ$

Theorem: Suppose $X$ and $Y$ have a bivariate normal distribution,

$f_{X} (x)$ is normal with $(μ_{X}, σ_{X}^{2})$ and $f_{Y} (y)$ is normal with $(μ_{Y}, σ_{Y}^{2})$
$ρ$ is the correlation coefficient between $X$ and $Y$
$E (Y ∣ x) = μ_{Y} + \frac{ρ σ _{Y}}{σ _{X}} (x - μ_{X})$
$Var (Y ∣ x) = (1 - ρ^{2}) σ_{Y}^{2}$

Theorem: The maximum likelihood estimators for $μ_{X}, μ_{Y}, σ_{X}^{2}, σ_{Y}^{2}$ , and $ρ$ are respectively $\overset{ˉ}{X}, \overset{ˉ}{Y}, (\frac{1}{n}) \sum_{i = 1}^{n} (X_{i} - \overset{ˉ}{X})^{2}, (\frac{1}{n}) \sum_{i = 1}^{n} (Y_{i} - \overset{ˉ}{Y})^{2}$ , and $R$

How do we test the two variables are independent? We test $H_{0} : ρ = 0$

Theorem: Under the null hypothesis $ρ = 0$ , $T_{n - 2} = \frac{n - 2 R}{1 - R ^{2}}$ has a Student $t$ distribution with $n - 2$ degrees of freedom

Binyamin's Notes

Explorer

The Linear Model

Covariance and Correlation

The Bivariate Normal Distribution

Table of Contents