Reports

The issue lies in your incorrect computation of the gradient for the output layer during backpropagation. When using softmax activation followed by cross-entropy loss, the gradient simplifies to the predicted probabilities (self.output) minus the one-hot encoded ground truth labels. Your current implementation manually iterates over each class and sample, reapplying softmax and calculating differences, which is both inefficient and prone to numerical instability. Instead, you should directly subtract 1 from the softmax outputs at the target class indices (self.output[range(batch_size), desired_outputs] -= 1) and normalize over the batch size. This gives the correct gradient for backpropagation. Additionally, ensure that weights and biases are updated using this gradient, scaled by the learning rate. Correcting this will allow the model to learn properly and reduce the loss during training.

79618985