Convolutional neural networks have had breakthrough success in image recognition, natural language processing, and even board games like Chess and Go. But what’s really going on during convolution? Well, I think the easiest way to explain is with an interactive demo. Feel free to play around with the parameters below to see for yourself!

number:
padding:
kernel size:
stride:
speed:

You can use the settings above to control the hyperparameters of the convolutional layer.

As you’ve probably figured out, the left side shows the input, a 20x20 handwritten digit, and the right side shows the output as it is being drawn. A convolutional layer will pass a filter over the input, selecting only a small group of pixels at a time.

Not shown: once the filter selects a window of pixels, it will compute a dot product with the pixel values and its own weights. In this demo the filter weights are assumed to be 1 for simplicity.

More on the hyperparameters

Sometimes the settings will result in an invalid convolution. This happens if the kernel extends beyond the activation map. We can write a python function to check whether it will be valid:

def valid_convolution(input_size, kernel_size, padding, stride):
  if (input_size + 2*padding - kernel_size) % stride == 0:
    return True
  else:
    return False

Padding

Increasing the padding will add extra pixels, typically zeros, around the edges of the input. This is done in order to ensure that the layer parameters will be valid. Additionally, it causes pixels which would have been on the edges to be more to the middle of the activation map.

Kernel Size

This refers to the “sliding window” which passes over the input. Choosing a larger kernel tends to increase the “smudging” which happens in the output, while a smaller kernel preserves more information for deeper layers. Sometimes kernel size is referred to as filter size.

Stride

Stride refers to the distance the sliding window will move between steps. A stride of 1 moves the window one pixel to the right, while a stride of 2 moves the window 2 pixels. A larger stride can reduce output layer size at the expense of granularity.