In [1]:
import albumentations as albu
import matplotlib.pyplot as plt

import numpy as np
In [2]:
from google.colab import drive
drive.mount('/content/drive', force_remount=True)
Mounted at /content/drive
In [3]:
%cd /content/drive/My\ Drive/kaggle_cloud/src_cloudflower2/CloudFlower2
# If we need to check out the latest code, uncomment below: 
!git pull
/content/drive/My Drive/kaggle_cloud/src_cloudflower2/CloudFlower2
Already up to date.

Background

Many computer vision applications incorporate an augmentation step into the machine learning pipeline and claim that image augmentation will enhance the performance of the deep learning model ([1][2]). An intuitive explantion is that by augmentation, we introduce some extra variance into the training dataset, and thus decrease the distance between the training data and the possible unseen data.

An interesting analogy in [1] mentions that "data augmentation is similar to imagination or dreaming. Human imagine different scenarios based on past experience. Imagination helps us gain a better understanding of our world. Similarlly, augmentation can imagine alternataions of images such that they have a better understanding of the data."

In this post, we will show how to conduct image augmentation using the python library "albumentations". To be specific, we will demonstrate

    1. How to define some common augmentations in albumentations;
    1. What are the effects of those augmentations.

We will also discuss how to include the augmentation into a deep learning workflow using data from "Understanding the Clouds" competition on Kaggle.

Define an Augmentation Function

Image augmentation is the process of creating new images (usually for the training purpose) by slightly "modifying" the original images. The modification can be (but not limited to) rotation, flipping, cropping, blurring, etc. There are many image processing libraries available to conduct those transformations, but albumentations has its own advantages.

According to albumentations' documentation ([3]), it is convenient to use as it can easily

  • "apply the same transformation to an image and for lables for segmentation, object detection, and keypoint detection tasks".
  • "work with probabilities", and
  • "define a sequence of augmentations in a unified pipeline".

Regarding specifying a probability, reference [3] has the following explanation, "During training, you usually want to apply augmentations with a probability of less than 100% since you also need to have the original images in your training pipeline. Also, it is beneficial to be able to control the magnitude of image augmentation, how much does the augmentation change the original image".

Furhter, If the original dataset is large, you could apply only the basic augmentations with probability around 10-30% and with a small magnitude of changes. If the dataset is small, you need to act more aggressively with augmentations to prevent overfitting of neural networks, so you usually need to increase the probability of applying each augmentation to 40-50% and increase the magnitude of changes the augmentation makes to the image. Image augmentation libraries allow you to set the required probabilities and the magnitude of values for each transformation.

We will illustrate those points through the example shown here.

a. An function interface to work with the pytorch framework

A callable object transform to define the augmentation operations, which will be called in the manner of transform(image, mask). The albumentations.core.composition.Compose class has a __call__(self) function and can be used to as the input to the training framework directly.

In [4]:
def get_training_augmentation():
    """
    Define the preprocessing for the training data. 
    """
    train_transform = [
        albu.HorizontalFlip(p=0.5),
        albu.Resize(320, 640)
    ]
    return albu.Compose(train_transform)

What did we define in the code above? Can we explain them one by one?

b. The format of input image

Next, we will load some helper functions to create a dataset and retrieve samples from it. A function to visualize the satellite images and their masks is also included.

We can retrieve the image and the associated masks and provide "img = image, mask = mask" to the transform function. The shape of image is 3 x M x N, the shape of mask is 4 x M x N. This is consistent with the way we use the augmentation function in the deep learning pipeline.

In [5]:
# A function to get an image sample from the dataset
from utils.cloud_dataset import CloudDataset
from utils.dataset_helper import viz_image_mask_arrays

A Simple Experiment

a. Load the dataset and pick a random sample

In [6]:
from utils.dataset_helper import read_train_df, split_image_dataset

PATH = '/content/drive/My Drive/kaggle_cloud/data'
FOLDER = 'train_images'
image_folder = f'{PATH}/{FOLDER}'

csvfile = f'{PATH}/train.csv'
data_df = read_train_df(csvfile)
df_train, df_valid = split_image_dataset(data_df, train_ratio=0.75, max_n_images=50)

# Initialize a dataset object without any preprocessing or augmentation.
train_dataset = CloudDataset(df_train, image_folder)
In [7]:
# You can pick up a random file: 
# image_name = 'dede987.jpg'
image_name = '0011165.jpg'

data_df.loc[data_df['image_name'] == image_name].head()
Out[7]:
Image_Label EncodedPixels image_name label
0 0011165.jpg_Fish 264918 937 266318 937 267718 937 269118 937 27... 0011165.jpg Fish
1 0011165.jpg_Flower 1355565 1002 1356965 1002 1358365 1002 1359765... 0011165.jpg Flower
2 0011165.jpg_Gravel NaN 0011165.jpg Gravel
3 0011165.jpg_Sugar NaN 0011165.jpg Sugar
In [8]:
img, masks = train_dataset._get_original_item(image_name)

print("Before augmentation:")
print("The shape of image data: ", img.shape)
print("The shape of the masks:", masks.shape)
Before augmentation:
The shape of image data:  (1400, 2100, 3)
The shape of the masks: (1400, 2100, 4)
In [9]:
viz_image_mask_arrays(img, masks)

b. Apply the augmentation

In [10]:
aug = get_training_augmentation()
rst = aug(image=img, mask=masks)
In [11]:
print("After augmentation:")
print("The shape of the image:", rst['image'].shape)
print("The shape of the masks:", rst['mask'].shape)
After augmentation:
The shape of the image: (320, 640, 3)
The shape of the masks: (320, 640, 4)
In [12]:
viz_image_mask_arrays(rst['image'], rst['mask'])
In [16]:
#aug = get_training_augmentation()
rst2 = aug(image=img, mask=masks)
viz_image_mask_arrays(rst2['image'], rst2['mask'])

Here are some observations from the above experiments on one sample image:

Image and Masks: The augmentation treat the original images and the masksin a consistent way. To be specific, the masks are still associated with the corresponding pixels correctly after augmentation.

The size of the image dataset: If we simply apply the augmentation class to an image dataset, it will not change the size of the dataset (one image in, one image out). In other words, the augmentation itself doesnot upsample the dataset automatically.

Randomness in augmentation: Some randomness are introduced into the augmentation functions. With the same image, the augmentation library will produce different results according to some random distribution. The effects of this randomness needs to be considered when evaluating the accuracy of the deep learning model.

References:

[1] Shorten and Khoshgoftaar (2019), "A survey on Image Data Augmentation for Deep Learning", available at https://link.springer.com/article/10.1186/s40537-019-0197-0#Fig2.

[2] Perez and Wang (2017), "The Effectiveness of Data Augmentation in Image Classification using Deep Learning".

[3] Documents for the image augmentation library "albumentation", https://albumentations.ai/docs/.