Finite Markov Chains

A discrete-time stochastic process is made up of $X_{n}$ for $n = 0, 1, 2, \dots$ , where $X_{n}$ takes on values in a finite set

The possible values for $X_{n}$ are the states of the system

We can describe the probability of a process as $P {X_{0} = i_{0}, X_{1} = i_{1}, \dots, X_{n} = i_{n}}$ or more expressively, as an initial probability distribution $ϕ (i) = P {X_{0} = i}$ and the transition probabilities $q_{n} (i_{n} ∣ i_{0}, \dots, i_{n - 1})$

The Markov property states that transitions are reliant only on the current state, not on previous states, $P {X_{n} = i_{n} ∣ X_{0} = i_{0}, \dots, X_{n - 1} = i_{n - 1}} = P {X_{n} = i_{n} ∣ X_{n - 1} = i_{n - 1}}$

A process is time-homogeneous if the transition probabilities are not functions of $n$

A time-homogeneous Markov chain has each transition probability defined as $p (i_{n - 1}, i_{n})$ for some function $p : S \times S \to [0, 1]$ . These are of special interest.

We can define this kind of process with an initial distribution $ϕ (i)$ and transition probabilities $p (i, j)$ . $p$ is nicely expressed as a transition matrix $P_{ij} = p (i, j)$ .

We call $P$ a stochastic matrix, since $0 \leq P_{ij} \leq 1$ and $\sum_{j = 1}^{N} P_{ij} = 1$

If we express $ϕ (i)$ as a vector $ϕ_{0}$ , we can write $ϕ_{n} = ϕ_{0} P^{n}$

Large-Time Behavior

We are interested in seeing what happens with $lim_{n \to \infty} P^{n}$ . We first observe that for many systems, $π = lim_{n \to \infty} v P^{n}$ for any starting $v$ .

We call $π$ an invariant probability distribution for $P$ if $π = π P$ (also called a stationary, equilibrium, or steady-state distribution)

Definitionally, $π$ is a left eigenvector of $P$ with eigenvalue $1$

Question: How do we know $π$ exists?

The key in this process is that diagonalizing $P$ gives us an eigenvalue of $1$ and other eigenvalues with absolute value $< 1$ . This means that repeated powers leaves us with only the eigenvector corresponding to $1$ .

Now, for any stochastic matrix, we have $1$ as a right eigenvector with eigenvalue $1$ , which means there must be a left eigenvector with eigenvalue $1$ . So we must show,

The left eigenvector can be chosen to have nonnegative entries (and therefore can be normalized to be a distribution)
The eigenvalue $1$ is simple (no multiplicity), and all other eigenvalues have absolute value $< 1$

We cannot always diagonalize $P$ , but the Jordon decomposition is good enough and has the same effect

Perron-Frobenius Theorem: If $P$ has all positive entries, then $1$ is a simple eigenvalue, its left eigenvector can be chosen as positive, and all other eigenvalues have absolute value $< 1$

This does not cover certain stochastic matrices with $0$ entries that do satisfy the limit behavior. If $P^{n}$ satisfies Perron-Frobenius, then $P$ satisfies our needed conditions

Now our question is which kinds of stochastic matrices have some $P^{n}$ with all positive entries

Cases where this is not true,

Simple random walk with reflecting boundary: In this case, every step alternates between even and odd states. The matrix will basically “oscillate” instead of converging.
Simple random walk with absorbing boundary: In this case, the boundaries absorb all processes, while the middle (transient) states will go to $0$ .
Two non-interacting chains: We call this reducible

Definition: Two states communicate $i \leftrightarrow j$ if there exist $p_{m} (i, j) > 0$ and $p_{n} (j, i) < 0$ , meaning a process starting in one state can reach the other

$\leftrightarrow$ is an equivalence relation

Reflexive: $i \leftrightarrow i$
Symmetric: $i \leftrightarrow j ⟺ j \leftrightarrow i$
Transitive: $i \leftrightarrow j \cap j \leftrightarrow k ⟹ i \leftrightarrow k$

These equivalence relations partition our state space into disjoint communication classes

If there is only one communication class, then the chain is irreducible. Matrices with invariant distributions are irreducible. But we also see irreducible chains that do not have invariant distributions (our reflecting boundary chain).

Some communication classes are transient, meaning a process will leave them and never return with probability $1$

For each recurrent class $R$ , the submatrix of $P$ obtained from only considering the rows and columns for states in $R$ is also a stochastic matrix. We can analyze the large-time behavior of $R$ by only considering this submatrix.

Definition: The period of a state $d (i)$ is the greatest common divisor of $J_{i} := {n \geq 0 : p_{n} (i, i) > 0}$

Theorem: If $i \leftrightarrow j$ then $d (i) = d (j)$

Definition: An irreducible matrix $P$ is aperiodic if $d = 1$ and therefore has all entries positive for some $P^{n}$ . According to Perron-Frobenius, there exists a unique invariant probability vector $π$ , such that $lim_{n \to \infty} ϕ P^{n} = π$ for any $ϕ$ .

Reducible or Periodic Markov Chains

If $P$ is reducible with $r$ recurrent classes $R_{1}, \dots, R_{r}$ and $s$ transient classes $T_{1}, \dots, T_{s}$ , then each recurrent class acts as a smaller Markov chain. Therefore, there are $r$ invariant probability vectors $\overset{π}{ˉ}^{1}, \dots, \overset{π}{ˉ}^{r}$ , and the eigenvalue $1$ has multiplicity $r$ .

If $i \in R_{k}$ , $lim_{n \to \infty} p_{n} (i, j) = π^{k} (j)$ if $j \in R_{k}$ or $0$ otherwise
Therefore, $lim_{n \to \infty} p_{n} (i, j) = 0$ for each transient state $j$

Let $α_{k} (i)$ be the probability that the chain starting in state $i$ ends up in recurrent class $R_{k}$ . We see that $lim_{n \to \infty} p_{n} (i, j) = α_{k} (i) π^{k} (j)$ . $lim_{n \to \infty} ϕ P^{n}$ exists but depends on $ϕ$ .

How do we understand the powers of $P$ ?

If $P$ is irreducible but has period $d > 1$ , then the state space splits into $d$ sets $A_{1}, \dots, A_{d}$ . In general, $P$ will have $d$ simple eigenvalues with absolute value $1$ , the $d$ solutions to $z^{d} = 1$ . $ϕ P^{n}$ will cycle through these $d$ distributions, which will average to $π$ . In this case, $π (i)$ does not represent the limit of $p_{n} (j, i)$ , but the average time spent in $i$ .

Return Times

Let $X_{n}$ be an irreducible Markov chain with transition matrix $P$ . Consider the amount of time spent in state $j$ up to and including time $n$ ,
$Y (j, n) = \sum_{m = 0}^{n} I {X_{m} = j}$
We can relate this quantity to $π$ , since $lim_{n \to \infty} \frac{1}{n + 1} E (Y (j, n) ∣ X_{0} = i) = lim_{n \to \infty} \frac{1}{n + 1} \sum_{m = 0}^{n} P {X_{m} = j ∣ X_{0} = i} = π (j)$

Assume $X_{0} = i$ and let $T$ be the first time after $0$ that the Markov chain is in $i$ , $T = min {n \geq 1 : X_{n} = i}$

The $k$ th return to state $i$ is given as the sum of independent and identical random variables $T_{1} + \dots + T_{k}$ . Therefore, for large $k$ , we have $\frac{1}{k} (T_{1} + \dots + T_{k}) \approx E (T)$

In other words, in $k E (T)$ steps we find about $k$ visits. And since we know that in $n$ steps we expect about $nπ (i)$ visits, we have

E (T) = \frac{1}{π ( i )}

Suppose $P$ has some transient states and let $Q$ be the submatrix of $P$ which includes only the rows and columns for the transient states,

P = [\tilde{P} S 0 Q], P^{n} = [\tilde{P}^{n} S_{n} 0 Q^{n}] .

$Q$ is called a substochastic matrix, since it has row sums $\leq 1$ . We already know $Q^{n} \to 0$ , implying its eigenvalues have absolute value strictly less than $1$ . Therefore, $I - Q$ has no $λ = 0$ and we can define $M = (I - Q)^{- 1}$ , which is called the fundamental matrix.

For a transient state $i$ , $Y_{i} = \sum_{n = 0}^{\infty} I {X_{n} = i} < \infty$ . Suppose $X_{0} = j$ is another transient state, $E (Y_{i} ∣ X_{0} = j) = \sum_{n = 0}^{\infty} p_{n} (j, i)$ , which is also the $(j, i)$ entry of the matrix $I + Q + Q^{2} + \dots$

We can show that $(I + Q + Q^{2} + \dots) (I - Q) = I$
So $I + Q + Q^{2} + \dots = (I - Q)^{- 1} = M$

We can do a lot with this. Observe how the expected number of steps until the chain enters a recurrent class, assuming $X_{0} = j$ , is the sum of the $j$ th row of $M$

To find the expected number of steps the irreducible Markov chain takes to go from any state $j$ to another state $i$ , we rearrange $P$ such that $i$ is the first site, and modify it to be an absorbing site. $E (T_{i} ∣ X_{0} = j) = E [\sum_{k \neq = i} T_{i, k} ∣ X_{0} = j] = \sum_{k \neq = i} M_{jk}$

Note that the above procedure is only going to make sense on an irreducible chain (I don’t think this quantity would always be defined otherwise).

What if there are multiple recurrent classes? Starting at a given transient state $j$ , what is the probability that the Markov chain ends up in a particular recurrent class?

To simplify this problem, we treat recurrent classes as single points $r_{1}, \dots, r_{k}$ with $p (r_{i}, r_{i}) = 1$ . This lets us order our transition matrix as,

[I S 0 Q]

Denote $α (t_{i}, r_{j})$ be the probability that the chain starting at transient state $t_{i}$ ends up in recurrent state $r_{j}$ . With two recurrent states, $α (r_{i}, r_{i}) = 1$ and $α (r_{i}, r_{j}) = 0$ if $i \neq = j$ . For any transient state $t_{i}$ , $α (t_{i}, r_{j}) = P {X_{n} = r_{j} ∣ X_{0} = t_{i}} = \sum_{x \in S} P {X_{1} = x ∣ X_{0} = t_{i}} P {X_{n} = r_{j} ∣ X_{1} = x}$
$= \sum_{x \in S} p (t_{i}, x) α (x, r_{j})$

We can write this as $A = S + QA$ , where $A_{ij} = α (t_{i}, r_{j})$ is an $s \times k$ matrix,

A = (I - Q)^{- 1} S = MS

Reversibility

^202837
Markov chains define a direction for time. Can we reverse the direction of a chain?

$P [X_{n + 1} = j ∣ X_{n} = i] = P_{ij}$
$P [X_{n} = i ∣ X_{n + 1} = j] = ?$

Let’s label this “flipped” probability $Q_{ij}$
$Q_{ij} = \frac{P _{ij} \cdot P [ X _{n} = i ]}{P [ X _{n + 1} = j ]}$

We see that conditioning on the entire future leads to the same result, $Q_{ij} = \frac{P [ X _{n} = i , X _{n + 1} = j , \dots , X _{n + k} = x _{n + k} ]}{P [ X _{n + 1} = j , \dots , X _{n + k} = x _{n + k} ]} = \frac{P _{ij} \cdot P [ X _{n} = i ]}{P [ X _{n + 1} = j ]}$

However, this probability is still a function of $n$ . We can take the initial condition to be the invariant measure, and now the Markov chain is time-homogeneous. So we see $Q_{ij}$ must be equal to $\frac{π _{i} P _{ij}}{π _{y}}$

Definition: If $π_{i} P_{ij} = π_{j} P_{ji}$ for some $π$ then $Q_{ji} = P_{ij}$ and $X_{n}$ is reversible (and $π$ is the invariant measure)

Binyamin's Notes

Explorer

Large-Time Behavior

Reducible or Periodic Markov Chains

Return Times

Reversibility

Table of Contents