79577732

Date: 2025-04-16 17:02:04
Score: 0.5
Natty:
Report link

Using zipWithIndex for precise batching

rdd = df.rdd.zipWithIndex()
batched_rdds = rdd.map(lambda x: (x[1] // batch_size, x[0])).groupByKey().map(lambda x: x[1])
batched_dfs = [spark.createDataFrame(batch, schema=df.schema) for batch in batched_rdds.collect()]
Reasons:
  • Low length (0.5):
  • Has code block (-0.5):
  • Low reputation (0.5):
Posted by: poonam poonia