Issue was default timeout for real-time Inference endpoint being 60 seconds. Seems like missing the timeout threshold caused the request to repeat for some reason (docs)
Switching to async inference endpoint solved it as request takes ~2m