It achieves 90% accuracy on the validation set, but drops to below 50% on the test set, which is really frustrating. 10,000 images, with 80% as the training set and 20% as the validation set.