Although, the plan shows hash partitioning of A twice for creating both the joined dataframes AB and AC, it does not mean under the hood the tasks are not reusing already hashed partitions of A . Spark skips the stages if it finds the steps redundant even if its part of the plan. Can you check your DAG to see if the stages are skipped like shown below?