I think a combination of create_map
from pyspark.sql.functions
, chain
from itertools
and mapping might get you a more performant way.
This requires you are able to map your algorithm to a dict as per this SO answer: ttps://stackoverflow.com/a/42983199/2186184
Since you have multiple conditions you might need nested dicts, not sure if that is allowed in create_map
though.