I'm also learning about this stage division issue, especially the .count().
Unlike the accepted answer, I think the three stages are the three operations: reading data, repartition(4), and .count().
Because .repartition(4) will trigger a shuffle.