Reports

Shuffling the data and then distributing it between train,dev and test sets would make them from the same distribution and in my opinion that would be better .

Reasons :

If the model is only expected to work in the same environments it was trained on, this reflects its real-world performance.

Each subset (train/dev/test) benefits from the full diversity of all 5 locations. This can help as deep learning models can be really data-hungry.

Hope it helps !!

79644221