### Optimal Parking Strategy with Markov Chains and Dynamic Programming

Dynamic programming is an interesting topic with many useful applications in the real world. However, the magnitude of the state-space of dynamic programs can quickly become unfathomable. Warren B. Powell has published a book where various approximations can be applied to make otherwise intractable problems possible to solve. There is an interesting problem in this book which does not demonstrate the methods of approximation, but rather how an optimal control policy can be chosen given a Markov Chain.

Dynamic programming is an interesting topic with many useful applications in the real world. However, the magnitude of the state-space of dynamic programs can quickly become unfathomable. Warren B. Powell has published a book where various approximations can be applied to make otherwise intractable problems possible to solve. There is an interesting problem in this book which does not demonstrate the methods of approximation, but rather how an optimal control policy can be chosen given a Markov Chain.

We are driving a car and we want to park it in such a way that the total time to get to a restaurant is minimized. The parking lot contains 50 spaces which are not occupied given a probability of \(p = 0.6\). For some reason, we cannot see whether the next space is occupied or not until we are approaching it. Driving between the spaces takes 2 seconds and walking from there to the restaurant takes \(8(50-n)\) seconds, where n is the parking lot number.

If the last space is reached we can always get an opportunity to park, but that would take 30 seconds. The final set of states is given in the following figure:

The action space for the free states 46-50 is either ”continue” or ”park”. For the taken states 46-50 we can only choose to ”continue”. We are using 0 = continue and 1 = park. For these final epochs we have the following 32 policies to choose from:

0 0 0 0 0 | 0 0 1 0 0 | 0 1 0 0 0 | 0 1 1 0 0 |

0 0 0 0 1 | 0 0 1 0 1 | 0 1 0 0 1 | 0 1 1 0 1 |

0 0 0 1 0 | 0 0 1 1 0 | 0 1 0 1 0 | 0 1 1 1 0 |

0 0 0 1 1 | 0 0 1 1 1 | 0 1 0 1 1 | 0 1 1 1 1 |

0 0 1 0 0 | 0 1 0 0 0 | 0 1 1 0 0 | 1 0 0 0 0 |

0 0 1 0 1 | 0 1 0 0 1 | 0 1 1 0 1 | 1 0 0 0 1 |

0 0 1 1 0 | 0 1 0 1 0 | 0 1 1 1 0 | 1 0 0 1 0 |

0 0 1 1 1 | 0 1 0 1 1 | 0 1 1 1 1 | 1 0 0 1 1 |

In each state we can calculate a value function which gives us the potential value of being in a given state. The future value is calculated as $$p V^{\text{free}}_{n} + (1-p) V^{\text{taken}}_{n}$$. This means that we have to calculate the value of each node for every possible action. Solving dynamic programming problems explicitly this way is doomed to fail due to the curse of dimensionality. Approximative dynamic programming can help to cope with large problems, but in this case it is not necessary.

To calculate the values we have to go backward in the graph. First, we start with parking space 50: $$V_{50}^{\text{free}}(0) = 30, \qquad V_{50}^{\text{free}}(1) = 0, \qquad V_{50}^{\text{occupied}}(0) = 30$$ For parking lot 49 we have: $$V_{49}^{\text{free}}(0) = 2 + p V_{50}^{\text{free}} + (1-p) V_{50}^{\text{occupied}}, \qquad V_{49}^{\text{free}}(1) = 8, \qquad V_{49}^{\text{occupied}}(0) = V_{49}^{\text{free}}(0) = 12$$ and so on.

By doing these calculations we find that the optimal policy is to ignore all parking lots except for the last two ones where you should park if the opportunity is given.