Recall, the expected value of a random variable is if is discrete and if is continuous

If for some a and then

If and have a joint probability mass function then

The continuous analog to this is

We can apply this to find

An inductive argument shows . This property is extremely useful, and is often referred to as linearity of expectation.

We can break down a binomial random variable into Bernoulli random variables to show that

We can break down a negative binomial random variable into geometric random variables with parameter (and expectation ) to show that

We can also break down a hypergeometric random variable into indicator variables if the th ball selected is white and otherwise. Any ball is equally likely to be the th ball, so and

Example: Analyzing Quick-Sort

Let be a function on a finite set and suppose we want to find . We can find a lower-bound probabilistically, since

The textbook uses this to find bounds on Hamiltonian paths in an example

Observe: If is the number of events that occur from , then is the number of pairs of events that occur. We can see that

We get

An extension of this is

These can help let us calculate and

We can also write

Definition: The covariance between and is defined as . Expanding this yields .

If and are independent, then , however the converse is not true

We have some simple properties,

With 4., we can derive

If are pairwise independent then

The correlation of two random variables and , denoted by is defined as .

We can prove that pretty simply. Say and have variances given by and ,


behaves similarly to covariance but is nice and bounded. It can also be viewed as a measure of the degree of linearity between and , since implies , which means is constant.

Conditional Expectation

Remember that we have
It makes sense to also define

We denote as a function of (whose value at is ). This means is also a random variable.

Theorem:
This implies (or for a continuous variable), which is just the law of total probability

We can also interpret this theorem as taking the weighted average of the conditional expected value of for each

This theorem can help us calculate probabilities by conditioning on indicator variables

Conditional variance is also well-defined, as

To go with it, there’s a useful identity

Note: If we want to predict given with a function , our best predictor is

Definition: The moment generating function of is defined for all as . It’s called the moment generating function because all moments can be obtained by differentiating and evaluating at .

This assumes that , which is generally true

The moment generating function of joint variables is a multivariable function . It can be proven that this function uniquely determines the join distribution of . This function can also be used to find the individual moment generating functions of .