I think your dataset is running out cause you're batching before repeating. Try replacing dataset = dataset.repeat(epochs).batch(batch_size), it should not be the other way around. Try this, and do you have a Git repo for this