The issue had to do with the operator versioning. I exported with opset=20, which caused GridSample to be exported at version=20, for example. However, the CUDA provider has only implemented it for version=16+. Re-exporting at opset=17 fixed the issue.