This is an old question, but i have faced this problem in these days.
For me a solution have been to set inside the job:
export CUDA_VISIBLE_DEVICES=0,1,2,3
and keep the Trainer configuration to
devices: 4
Maybe someone else can share their own solutions, if any.