Thanks to @jqurious for getting me to an answer!
I was able to get the plugin running in parallel by forcing it through a LazyFrame and collecting with .collect(engine="streaming")
. Instead of doing
df = df.with_columns(my_plugin(colname, arg))
I did
df = df.lazy().with_columns(my_plugin(colname, arg)).collect(engine="streaming")
and this worked as expected, giving me a ~30x speedup on a 32-core machine. I'm not sure if this is the way Polars intends plugins to work, but it did work.