I feel daft, but I dont see how the code helps you resume from where an error/exception is thrown. I have a similiar issue with a script that takes days to execute, and thus can get stuck or "die" for various reasons. I need it to resume where it stopped without me intervening. My current go-to-solution is saving pickle files and reopening them, but not sure it is the most efficient one.