In this series of articles, we will talk about a useful technique called "Markov Chain Monte Carlo" (MCMC). As a warm-up, today we will guide you through some important while not so obvious definitions to help you build up a conceptual framework about Markov chain.
A Markov chain is a random process where the state variable moves in a "memoryless" way (for the next step, only the state of current step matters). To be specific, if we denote the state at step $n$ as $X_n$, then $$ P(X_{n+1}|X_{n}, X_{n-1}, ..., X_1)=P(X_{n+1}|X_{n}) $$ Equivalently, we can say $$ P(X_0, X_1, ..., X_n)=P(X_n|X_{n-1})P(X_{n-1}|X_{n-2})...P(X_1|X_0)P(X_0)) $$ In this article, we will focus our attention on discrete time, finite state space MC.
Stationary distribution is an important property in Markov chain theory.
Definition. Stationary distribution $\pi$
For a discrete time and finite state space Markov chain, being "irreducible" and "aperiodic" implies it has one and only one stationary distribution.
Compared with irreducibility, aperiodicity is a less intuitive concept.
Definition. Aperiodic Markov chain
For state $s_i$, the definition of aperiodicty tells us that $gcd(N_i) := gcd\{n\ge1: (P^n)_{i,i}>0\}=1$, but it does not necessarily mean that 1 is an element in $N_i$.In other words, we don't have $P_{i,i}>0$ yet.
However, being aperiodic leads us to Thm 4.1, which says "there exists a $N<\inf$ such that $(P^n)_{i,i}>0$ for all $n>N$". In other words, when $n>N$ and the chain move forward one step, we can always go from state $i$ to $i$ with positive probability. This is very nice as it gives hopes for convergence when $n$ goes to infinity. The proof of Thm 4.1 utilizes a lemma from number theory which guarantees that all the $n$ after $N$ belongs to the set $N_i$.
To determine whether a Markov chain is aperiodic or not, there are some creterions that can be applied.
Step 1. We can draw a transition diagram from the transition matrix if the number of states are not too many. With the diagram, we can easily tell if the Markov chain is irreducible or not.
Step 2. It can be shown that if a Markov chain is irreducible, all of its states have the same period.
Take a look at the two MCs in the figure above. From the transition diagram, both chains are irreducible (as the states can all be reached from each other).
Therefore, we can look at only one of their states for the full information of periodicity.
For the chain on the left, state 1 has periods of 2, 4, ..., therefore its period is 2. State 1 is periodic and then the whole chain is not aperiodic. Simimlarly, look at state 1 (in the center). It has periods of 3, 6, ..., and thus its period is 3. The MC on the right is not aperiodic either.
A useful tip:If a Markov chain is irreducible and it has a state $i$ such that $P_{i,i}>0$, then it is aperiodic. (adapted from Prb 4.2. in the book)
Proof: $P_{i,i} > 0$ means the chain can come to state $i$ from state i in one step, and thus the period of i (gcd{$n\ge1: P^n_{i,i}>0$}) is 1. As we mentioned in step2, all the states have the same period for irreducible MC, and thus the chain is aperiodic.
Definition.. Reversible distribution
We probably know that a "reversible" distribution can be stationary if certain conditions are satisfied, but it is not that obvious why a reversible distribution is also a stationary one for the Markov chain. Let's prove it here.
A distribution $\pi$ is stationary means the distribution remains the same after the transition. Mathematically, it is $\pi P=\pi$. We can also write it in element-wise format, $\sum_i \pi_{i} P_{i,j} = \pi_j$. NOTE: The distribution on the right hand side of the equation would not be $\pi$ (but a different distribution) if it is not stationary.
If $\pi$ is reversible, we have $$\sum_i \pi_i P_{i,j} =\sum_j \pi_j P_{j,i},$$ so $$\sum_i \pi_i P_{i,j} = \sum_i \pi_j P_{j,i} = \pi_j \sum_i P_{j,i} = \pi_j$$ and therefore, $\pi$ is also a stationary distribution.
So far, we introduced what a Markov chain is and some of the nice-to-have properties such as irreducibility and aperiodicity. With those concepts in mind, we can move further down to discuss the stationary / reversible distribution associated with a Markov chain. Hope you have had a clear picture of discrete time finite space Markov chain so far and let's get ready for MCMC!
References:
[1] Olle Haggstrom, "Finite Markov Chains and Algorithmic Applications", Cambridge University Press 2002.
[2] Book section "Generating Samples from Probability Distributions": https://web.mit.edu/urban_or_book/www/book/chapter7/7.1.3.html
[3] Notes "Intro to Sampling Methods" (from PSU course "Math Tools for Computer Visions"): http://www.cse.psu.edu/~rtc12/CSE586/lectures/cse586samplingPreMCMC.pdf
[4] Videos about sampling methods (Machine Learning section 16 & 17), by "mathematicalmonk": https://www.youtube.com/watch?v=3Mw6ivkDVZc&list=PLD0F06AA0D2E8FFBA&index=132