Based on the code you shared and your response, following might be the reason for the error you are getting:
- The /output/folder where you are trying to write the data in a pod is causing the pod to run out of storage. Fixes for this include: writing to a cloud based object storage, or to a directory that has a disk of appropriate size mounted to it.
- Assuming it is not because of the output being too large to store for the specified location, it could be due to the persist step. There you can play around with different storage level based on the documentation here here
- It might also be prudent to look at the spark dashboard to see if the memory issue is isolated to few executors or all of them or spark driver. If it only few executors then it is likely due to data skew, and you can repartition the data to factor for it.