I think that I understand your problem in trying to implement the backpropagation by yourself, i am doing it too, for we backpropagate the the error what we do is take the derivative of the loss (which is given i.e. the derivative of cross-entropy loss is the softmax of the predictions minus the true labels so you have that single value that is the Dloss) and do the backpropagation using the same formula as we do for the linear layers (its the same thing exept that what changes is the operations) Dloss/Dw = Dloss/Dz * Dz/Dw, the loss with respect to the output z is the given Dloss and the derivative of the output z with respect to w is the input to that layer (sinze z = w * x so a change in w will be proportional to x) so for get the so called gradient of the weights (or kernels or filters same thing) is you multiply the given input to that conv layer times the Dloss and boom you get it! In my implementation what i did was create a class what holds the layers and store the intermediate outputs of it so i can propagate the loss backwards, and for propagate backwards what you do is take the derivate of the loss with respect to the input (yes you hear righjt) so in the formula it will be the derivative of the loss given Dloss * w (so you multiply the derivative of the loss times the kernels) and that result is the gradient that you will propagate to the next layer of the convolutional net being the Dloss of the next layer (reading end to begin order)