Recall, the expected value of a random variable is if is discrete and if is continuous
If for some a and then
If and have a joint probability mass function then
The continuous analog to this is
We can apply this to find
An inductive argument shows . This property is extremely useful, and is often referred to as linearity of expectation.
We can break down a binomial random variable into Bernoulli random variables to show that
We can break down a negative binomial random variable into geometric random variables with parameter (and expectation ) to show that
We can also break down a hypergeometric random variable into indicator variables if the th ball selected is white and otherwise. Any ball is equally likely to be the th ball, so and
Example: Analyzing Quick-Sort
Let be a function on a finite set and suppose we want to find . We can find a lower-bound probabilistically, since
The textbook uses this to find bounds on Hamiltonian paths in an example
Observe: If is the number of events that occur from , then is the number of pairs of events that occur. We can see that
We get
An extension of this is
These can help let us calculate and
We can also write
Definition: The covariance between and is defined as . Expanding this yields .
If and are independent, then , however the converse is not true
We have some simple properties,
With 4., we can derive
If are pairwise independent then
The correlation of two random variables and , denoted by is defined as .
We can prove that pretty simply. Say and have variances given by and ,
behaves similarly to covariance but is nice and bounded. It can also be viewed as a measure of the degree of linearity between and , since implies , which means is constant.
Conditional Expectation
Remember that we have
It makes sense to also define
We denote as a function of (whose value at is ). This means is also a random variable.
Theorem:
This implies (or for a continuous variable), which is just the law of total probability
We can also interpret this theorem as taking the weighted average of the conditional expected value of for each
This theorem can help us calculate probabilities by conditioning on indicator variables
Conditional variance is also well-defined, as
To go with it, there’s a useful identity
Note: If we want to predict given with a function , our best predictor is
Definition: The moment generating function of is defined for all as . It’s called the moment generating function because all moments can be obtained by differentiating and evaluating at .
This assumes that , which is generally true
The moment generating function of joint variables is a multivariable function . It can be proven that this function uniquely determines the join distribution of . This function can also be used to find the individual moment generating functions of .