Reports

The backpropagation in the CNNs are very similar to how it happens in the fully connected layers but with different operations, as we know we begin the backpropagation by calculating the derivative of the loss with respect to the weights (of the filters or of the linear layers), Dloss/Dweights = Dloss/Dz * Dz/Dweights (where the Z is the output generated by the layer), so for we put it more simple lets think of the layer as a function that takes in f(x) and outputs z, breaking that function to parts we get y = x * w (where * denotes the cross-correlation operation, lets avoid the bias for simplicity) and after that y output we get the predictions and then the derivative of the loss (by doing the softmax of the raw predictions and subtracting from the real label ) so in that chain rule expression lets begin by getting Dloss/Dz with is the loss with respect to the output (or activations) it is the gradient that we are propagating backwards and we multiply that by dz/Dweights wich is the derivative of z with respect to the loss, remember from later z = x * w so a change in w is proportional to x (from the chain rule) so dz/Dweights are the input to that conv layer, the last step of the backpropagation of the first layer (for we can actually implement it programatically) is calculate the derivative of the loss with respect to the input, we did later the derivative of the loss with respect to the weights and we get the input z = x * w so a change in the input x is proportional to w so for we get it we multiply Dloss/Dz * Dz/Dx, where Dloss/Dz is the actual derivative of the loss which is given and Dz/Dx is the weights, we do that for we propagate the gradient backwards to the others layers (size we cant update the input). I hope my answer was useful

79331205