Reports

Check out a userscript which highlights deleted posts. GitHub

79577732

Date: 2025-04-16 17:02:04

Score: 0.5

Natty:

Using `zipWithIndex` for precise batching

rdd = df.rdd.zipWithIndex()
batched_rdds = rdd.map(lambda x: (x[1] // batch_size, x[0])).groupByKey().map(lambda x: x[1])
batched_dfs = [spark.createDataFrame(batch, schema=df.schema) for batch in batched_rdds.collect()]

Reasons:

Low length (0.5):
Has code block (-0.5):
Low reputation (0.5):

Posted by: poonam poonia

79577732

Using zipWithIndex for precise batching

Using `zipWithIndex` for precise batching