There are many potential applications of MCMC in machine learning. In this article, we will implement the Metropolis algorithm for Ising model and apply it to an image segmentation problem. To be specific, we will:

Show how to formulate image segmentation as a sampling problem and solve it using MCMC;
Demonstrate an implementation of Gibbs sampler (a special form of the Metropolis algorithm) in python;
Analyze the segmentation result and its relationship to the choice of parameters.

1. Data¶

As it is in cold winter at New York now, we will use an image of ski jumping as an example. Let's first read the image and convert it into a gray image.

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

import numpy as np
import random
import cv2
from google.colab.patches import cv2_imshow

# Read an image and convert it to a grey-value matrix
img_file = '/content/drive/My Drive/jump.jpg'
img = cv2.imread(img_file)

img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
print(img_gray.shape)

(480, 319)

cv2_imshow(img_gray)

2. Metropolis Algorithm¶

Metropolis algorithm is a classical algorithm to do Markov Chain Mento Carlo. Since it is a must-have for any textbook covering this topic, we don't repeat it here. For the detailed implementation, you can find it in the function "Metropolis4Ising_2d()".

In the Ising model, we assigns the following expression to the energy

$$ J(\xi) = -(1/2) \sum_{i\ne j}c_{i,j}\xi_i \xi_j - \theta \sum_i b_i \xi_i $$

where

$\xi_i$ is the configuration function $\xi$ evaluated at pixel $i$,
$b_i$ is the gray value at pixel $i$, and
$c_{ij}$ is 1 if $i$ and $j$ are connected and 0 otherwise.

The Gibbs Law with this energy at temperature $T$ is $$ \pi_T(\xi)=\frac{1}{Z_T} e^{-\frac{1}{T} J(\xi)} $$ $Z_T$ is chosen in a way such that $\pi_T$ is a probability mass function.

# Metropolis algorithm for Ising model
# Key points: 
#   How to program a configuration in the form of {-1, 1}^N: Just a matrix of {-1, 1}
#   Evaluate the energy for a certain configuration

def selection_function(u):
  return min(1, u)

def Metropolis4Ising_2d(data_2d, X0, n_steps, n_samples, T, theta):
  """
  A Metropolis algorithm for Ising model.  
  
  [INPUT]:
  X0: A 2D numpy array. 
    Its value at index i must be with {-1, 1}.
  n_steps: integer. 
    The number of simulation steps. 
  n_samples: integer.
    The number of samples to return.  
  """

  M, N = X0.shape
  X = np.copy(X0)

  random.seed(0)
  i_step = 0
  samples = []
  while i_step < n_steps:
    ix, iy = random.randint(0, M-1), random.randint(0, N-1)
    U = random.random()

    ix_neighbor = [ix+dx for dx in [-1,1] if ix+dx>=0 and ix+dx<=M-1]
    iy_neighbor = [iy+dy for dy in [-1,1] if iy+dy>=0 and iy+dy<=N-1] 
    affinity_of_neighbors = sum(X[ixp, iyp] for ixp in ix_neighbor for iyp in iy_neighbor)

    energy_diff = X[ix, iy] * (-2 * theta * data_2d[ix, iy] - affinity_of_neighbors)
    pdf_ratio = np.exp(energy_diff / T)
    h_val = selection_function(pdf_ratio)

    # Reserve the state at (ix, iy): 
    if U < h_val:
      X[ix, iy] = - X[ix, iy]

    # No matter reverse the state or not, save a copy if it is within the last 
    # n samples
    if i_step >= n_steps - n_samples:
      samples.append(np.copy(X))
    i_step += 1

  return samples

# A simple test function: 
def test_Metropolis():
  M, N = 3, 3
  data_2d = np.random.random((M,N)) * 2. - 1.

  X0 = np.zeros_like(data_2d)
  for i in range(M):
    for j in range(N):
      if data_2d[i,j] > 0:
        X0[i,j] = 1
      else:
        X0[i,j] = -1

  n_steps = 20
  n_samples = 2
  T = 100
  theta = 1.0
  
  samples = Metropolis4Ising_2d(data_2d, X0, n_steps, n_samples, T, theta)
  for i, s in enumerate(samples):
    print(f"The {i}-th sample:") 
    print(s)

There are some choice about hyper-parameters in the model that we need to make. The important parameters include

Initial mask: Since We first re-scale the gray value in the image into [-1, 1], and set 0 as a threshold to make the initial segmentation. In principle, you can choose any arbitrary number as the threshold and it does not matter as long as we go over enough iterations.
$\theta$: In the Ising model, we have two terms. The first term measures the connectivity of the segmentation (by $\xi_i \xi_j$), and the second term quantifies the role of the brightness of the pixel. $\theta$ is the weight of the second term. The larger $\theta$ we set, the more priority the algorithm gives to segment bright pixels in one group. In this experiment, we chose $\theta=1$.
T: temperature in the Gibbs Law. We use $T=100$ in this example.

data_2d = np.copy(img_gray).astype(float)
cv2.normalize(img_gray.astype(float), data_2d, alpha=-1, beta=1, norm_type=cv2.NORM_MINMAX)
print("The min/max gray value:", np.amin(data_2d), np.amax(data_2d))

X0 = np.copy(data_2d)
X0[X0>0] = 1
X0[X0<=0] = 0

The min/max gray value: -1.0 1.0

n_steps = 10000
n_samples = 2
T = 1
theta = 0.001
samples = Metropolis4Ising_2d(data_2d, X0, n_steps, n_samples, T, theta)

Xtoshow = np.copy(X0)
cv2.normalize(samples[-2], Xtoshow, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX)
cv2_imshow(Xtoshow)

3. Results Analysis¶

After 10K steps, the algorithm gives us a segmentation as the figure above shows. The algorithm captures most part of the althlete except the pants whose brightness is close to the background.

We also observe some noise point in the result. As $\theta$ or $T$ increases, the noise in the segmentation results becomes larger.

4. Summary¶

In this article, we implement a Metropolis algorithm to optimize the Ising model for image segmentation. Just as other machine learning models, there is no magic to replace the role of a proper objective function (Ising model in this case) for a successful application of MCMC.

Extended readings on MCMC for optimization:¶

Fetaya, "Stochastic optimization". Available at: http://www.wisdom.weizmann.ac.il/~ethanf/. (NOTE: A nice introduction from Metropolis algorithm to simulated annealing method.)

Ma et al. (2019), "Sampling can be faster than optimization", PNAS. Available at: https://www.pnas.org/content/116/42/20881. (NOTE: An interseting comparison on efficiency between stochastic methods and convex optimization methods.)

Hannah (2014), "Stochastic optimization" (part of the course "Computational Stochastics" at Columbia University). Avaialble at: http://www.stat.columbia.edu/~liam/teaching/compstat-spr14/lauren-notes.pdf.