Okay so I managed to fix my own issues with the help of chatGPT:
I had a few typos in there, notably in the update_params() function where I updated the derivatives instead of updating the actual layers.
Bias Update Issue: There's a potential issue in your update_params function for updating the biases:
db1 = b1 - alpha * db1
This line should be:
b1 = b1 - alpha * db1
Similarly, check db2 to ensure:
b2 = b2 - alpha * db2
Since the wrong variables are being updated, the biases remain unchanged during training, preventing effective learning.
What really changed the game though was these next two points:
Weight Initialization: Ensure that your weight initialization does not produce too large or too small values. A standard approach is to scale weights by sqrt(1/n), where:
n is the number of inputs for a given layer.
W1 = np.random.randn(10, 784) * np.sqrt(1 / 784)
W2 = np.random.randn(10, 10) * np.sqrt(1 / 10)
This prevents issues with vanishing/exploding gradients.
This was a game changer along with this:
Data Normalization: Make sure your input data X (pixels in this case) are normalized. Often, pixel values range from 0 to 255, so you should divide your input data by 255 to keep values between 0 and 1.
X_train = X_train / 255.0
This normalization often helps stabilize learning.
And there you have it. I am able to get 90% accuracy within 100 iterations. I'm going to now test different activation functions and find the most adequate. Thank you chatGPT.