How Convolutional Neural Networks Work
Understanding convolutions is the basic foundation to understand how CNNs work. Unfortunately, a lot of literature that exists jumps directly into the math without first trying to explain intuitively what convolutions are, and how they can be used for Deep Learning applications.
Brandon Rohrer from Microsoft put out an amazing video (and an associated blog post) trying to provide an intuitive explanation of the same.
Some quick notes:
When presented with a new image, the CNN doesnWhen presented with a new image, the CNN doesn’t know exactly where these features will match so it tries them everywhere, in every possible position. In calculating the match to a feature across the whole image, we make it a filter. The math we use to do this is called convolution, from which Convolutional Neural Networks take their name.âWhen presented with a new image, the CNN doesn’t know exactly where these features will match so it tries them everywhere, in every possible position. In calculating the match to a feature across the whole image, we make it a filter. The math we use to do this is called convolution, from which Convolutional Neural Networks take their name.t know exactly where these features will match so it tries them everywhere, in every possible position. In calculating the match to a feature across the whole image, we make it a filter. The math we use to do this is called convolution, from which Convolutional Neural Networks take their name.
Pooling is a way to take large images and shrink them down while preserving the most important information in them. [...]. It consists of stepping a small window across an image and taking the maximum value from the window at each step. In practice, a window 2 or 3 pixels on a side and steps of 2 pixels work well.
[on Backpropogation:] Each image the CNN processes results in a vote. The amount of wrongness in the vote, the error, tells us how good our features and weights are. The features and weights can then be adjusted to make the error less. Each value is adjusted a little higher and a little lower, and the new error computed each time. Whichever adjustment makes the error less is kept. After doing this for every feature pixel in every convolutional layer and every weight in every fully connected layer, the new weights give an answer that works slightly better for that image.
And finally, something that really shows how we’re just at the beginning of the deep learning revolution, especially when have hundreds of hidden layers in the network:
With so many combinations and permutations, only a small fraction of the possible CNN configurations have been tested. CNN designs tend to be driven by accumulated community knowledge, with occasional deviations showing surprising jumps in performance.
I’d highly recommend you take the time out to read the blog post and/or watch the accompanying video.