Shuffling the data and then distributing it between train,dev and test sets would make them from the same distribution and in my opinion that would be better .
Reasons :
If the model is only expected to work in the same environments it was trained on, this reflects its real-world performance.
Each subset (train/dev/test) benefits from the full diversity of all 5 locations. This can help as deep learning models can be really data-hungry.
Hope it helps !!