i can't comment so i leave this as an answer.
If I use a single GPU, then its fine. Below shows a dummy script that results in nan's after a few steps.
i think this might be due to your batch size; try increasing it as it will give your loss more stability. also what was the batch size you used for the single GPU training?
https://www.tensorflow.org/tutorials/distribute/keras#set_up_the_input_pipeline
if you check the link above you can see the line of code below.
BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync
hope this helps.