For me this part would not be correct:
output_shape = ((A.shape[0] - kernel_size) // stride + 1,
(A.shape[1] - kernel_size) // stride + 1)
Incase A.shape = [5, 5], kerne_size_ = 3, stride = 2 it would give output_shape = 2 but the result should be output_shape = 3. In my opinion the correct expression should be:
output_shape = ceil(((A.shape[0] - kernel_size) / stride + 1,
(A.shape[1] - kernel_size) / stride + 1)
Regards.