79624042

Date: 2025-05-15 19:09:33
Score: 2
Natty:
Report link

Thanks to @jqurious for getting me to an answer!

I was able to get the plugin running in parallel by forcing it through a LazyFrame and collecting with .collect(engine="streaming"). Instead of doing

df = df.with_columns(my_plugin(colname, arg))

I did

df = df.lazy().with_columns(my_plugin(colname, arg)).collect(engine="streaming")

and this worked as expected, giving me a ~30x speedup on a 32-core machine. I'm not sure if this is the way Polars intends plugins to work, but it did work.

Reasons:
  • Blacklisted phrase (0.5): Thanks
  • Has code block (-0.5):
  • User mentioned (1): @jqurious
  • Self-answer (0.5):
  • Low reputation (0.5):
Posted by: sclamons