Reports

Okay so I managed to fix my own issues with the help of chatGPT:

I had a few typos in there, notably in the update_params() function where I updated the derivatives instead of updating the actual layers.

Bias Update Issue: There's a potential issue in your update_params function for updating the biases:

db1 = b1 - alpha * db1

This line should be:

b1 = b1 - alpha * db1

Similarly, check db2 to ensure:

b2 = b2 - alpha * db2

Since the wrong variables are being updated, the biases remain unchanged during training, preventing effective learning.

What really changed the game though was these next two points:

Weight Initialization: Ensure that your weight initialization does not produce too large or too small values. A standard approach is to scale weights by sqrt(1/n), where:

n is the number of inputs for a given layer.

W1 = np.random.randn(10, 784) * np.sqrt(1 / 784)
W2 = np.random.randn(10, 10) * np.sqrt(1 / 10)

This prevents issues with vanishing/exploding gradients.

This was a game changer along with this:

Data Normalization: Make sure your input data X (pixels in this case) are normalized. Often, pixel values range from 0 to 255, so you should divide your input data by 255 to keep values between 0 and 1.

X_train = X_train / 255.0

This normalization often helps stabilize learning.

And there you have it. I am able to get 90% accuracy within 100 iterations. I'm going to now test different activation functions and find the most adequate. Thank you chatGPT.

79186816