Optimal Stopping

Example:
Imagine a player keeps rolling a die until they either stop or roll a $6$ . Their score is the last die they’ve rolled, or $0$ if they rolled a $6$ .

What’s the optimal strategy if the player wants to maximize their expected payoff?

Define $f (k) = k$ if $k \leq 5$ , $f (6) = 0$

Let $v (k)$ be the expected winnings of the player given that the first roll is $k$ if the player takes the optimal strategy

Let $u (k) = \frac{1}{6} (v (1) + v (2) + v (3) + v (4) + v (5) + v (6))$ be the expected payoff continuing after the first roll

If $f (k) > u (k)$ , the player should take the money, otherwise the player should roll again

$v (k) = max {f (k), u (k)}$

Since $v (k) \geq f (k)$ , we have $u (k) \geq \frac{1}{6} (f (1) + \dots + f (6)) = 5/2$

We now know a bit more about this game, that if the first roll is a 1 or a 2 then the player should roll again

$v (1) = v (2) = \frac{1}{6} (v (1) + \dots + v (4)) + \frac{5}{6}$

Suppose the first roll is a 4. If the optimal strategy was continuing, then the game would look like rolling until a 5 or 6 is rolled, meaning the expected winnings is $5/2$ . This is less than 4, so clearly the optimal strategy is stopping on 4.

Let $u$ be the expected winnings after continuing and assume we continue after 3, $u = P {roll \leq 3} u + \frac{1}{6} (4 + 5) ⟹ u = 3$

So we can stop or continue after rolling a 3 for the same result

Optimal Stopping of Markov Chains

Suppose $P$ is the transition matrix for a discrete-time Markov chain $X_{n}$ with state space $S$ , and define a payoff function $f (s)$ for all $s \in S$

We are interested in cases where $P$ is not irreducible, since otherwise we can always keep going until any condition is satisfied

A stopping rule or stopping time $T$ is a random variable that gives the time at which the chain is stopped, based only on what has happened up to step $n$

All reasonable rules divide the state space into two sets $S_{1}$ and $S_{2}$ and stop when the chain reaches a state in $S_{2}$

Our aim is to maximize expected payoff over all legal stopping rules
$v (x) = max_{T} E [f (X_{T}) ∣ X_{0} = x]$

We generalize the rules defined in the initial example,
$v (x) \geq f (x)$
$v (x) \geq P v (x) = \sum_{y} p (x, y) v (y)$
$v (x) = max {f (x), P v (x)}$

Assuming the optimal strategy, $T = min {j \geq 0 : X_{j} \in S_{2}}$ and $v (x) = E [f (X_{T}) ∣ X_{0} = x]$

We call a function $u$ superharmonic with respect to $P$ if $u (x) \geq P u (x)$
Let $T_{n} = min {T, n}$
$u (x) \geq E [u (X_{T_{n}}) ∣ X_{0} = x]$
We can prove this by induction

Since $u$ is a bounded function, we can say $u (x) \geq lim_{n \to \infty} E [u (X_{T_{n}}) ∣ X_{0} = x] = E [u (X_{T}) ∣ X_{0} = x]$

Suppose $u (x) \geq f (x)$ for all $x$ ,
$u (x) = E [u (X_{T}) ∣ X_{0} = x] \geq E [f (X_{T}) ∣ X_{0} = x] = v (x)$
Hence every superharmonic function larger than $f$ is greater than or equal to the value function $v$

$v (x) = inf u (x)$ , the infimum over all superharmonic functions $u (x) \geq f (x)$ , i.e. $v$ is the smallest superharmonic function with respect to $P$ that is greater than or equal to $f$

We use this fact to determine $v$ . Start with $u_{1} (x)$ which equals $f (x)$ if $x$ is absorbing and otherwise equals the maximum value of $f$

Let $u_{2} (x) = max {P u_{1} (x), f (x)}$
We see $u_{2} (x) \leq u_{1} (x)$ and therefore $P u_{2} (x) \leq P u_{1} (x) \leq u_{2} (x)$ , meaning $u_{2}$ is also superharmonic and greater than $f$

$u_{n} (x) = max {P u_{n - 1} (x), f (x)}$
It’s already clear from this equation that $v (x) = lim_{n \to \infty} u_{n} (x)$

Suppose we know the optimal strategy $S_{1}$ and $S_{2}$ ,
$u (x) = f (x)$ if $x \in S_{2}$ otherwise $u (x) = P u (x)$

For a finite-state Markov chain, the solution can be found directly from this system of linear equations

Now to prove the above, we see that $u (z) = lim_{n \to \infty} u_{n} (z)$ is a superharmonic function and $u (z) \geq f (z)$ for all $z$ , so $u (z) \geq v (z)$ .

Define $S_{2} = {z : u (z) = f (z)}$ and $S_{1} = {z : u (z) > f (z)}$
On $S_{1}$ , $P u (z) = u (z)$
Therefore, $u (z) = E [u (X_{T} ∣ X_{0} = z)]$
Since $v (z)$ is the largest expected value over all possible stopping sets, we get $u (z) \leq v (z)$

So $u (z) = v (z)$

Optimal Stopping with Cost

Assume continuing has a cost of $g (x)$
$v (x) = max_{T} E [f (X_{T}) - \sum_{j = 0}^{T - 1} g (X_{j}) ∣ X_{0} = x]$
$v (x) = max {f (x), P v (x) - g (x)}$

The optimal stopping rule is to stop when the chain enters $S_{2} = {x : v (x) = f (x)}$

$v$ is the smallest function $u$ greater than $f$ satisfying $u (x) \geq P u (x) - g (x)$

Our algorithm is similar,
$u_{1} (x) = f (x)$
$u_{n} (x) = max {f (x), P u_{n - 1} (x) - g (x)}$ and
$v (x) = lim_{n \to \infty} u_{n} (x)$

Including a cost function means irreducible Markov chains become more interesting

Optimal Stopping with Discounting

In financial matters, we often assume the value of money decreases over time

$v (x) = max_{T} E [α^{T} f (x_{T}) ∣ X_{0} = x]$
$v$ is the smallest function $u$ with $u (x) \geq f (x)$ and $u (x) \geq α P u (x)$
We can use the same recursive algorithm

Binyamin's Notes

Explorer

Optimal Stopping of Markov Chains

Optimal Stopping with Cost

Optimal Stopping with Discounting

Table of Contents