I have encountered this issue. Instead of using spark to reduce the partition to a single file, convert the spark dataframe to pandas dataframe and then save it. It will work, and it will take less time