check if the loss before loss.backward() requires grad by printing loss.requires_grad. If not you should check in the loss calculation function if:
pred_conf[i] requires grad?From what I see, your function in detect.py convert tensor to numpy and python, which break the gradient chain. That should be why your loss doesn't require grad.