Reports

check if the loss before loss.backward() requires grad by printing loss.requires_grad. If not you should check in the loss calculation function if:

Any of your for loop is called?
If yes, then check if any pred_conf[i] requires grad?

From what I see, your function in detect.py convert tensor to numpy and python, which break the gradient chain. That should be why your loss doesn't require grad.

79199642