The issue was an incompatibility between my cluster's filesystem and the caching behavior. using the --cache_dir flag to point at the worker node's tmp
--cache_dir