I think you are confused about the 32 part. For Conv2D(32, (3, 3)), the 32 is the number of filters, not the size of the image in pixels. The 3, 3 part is the size of each filter (or kernel) as seen in the answer above.