This might be one of the underlying causes but encountered a similar issue , which was identified to be caused by skewed data as analysed from Spark UI.
This was handled by broadcast join.
df1.join(df2.broadcast(), "join_column")