How can I do a calculation that depends on previous (row's) calculation result that is in the same column?
The short answer is that you can't without falling back into Python. To do this, any library would need to essentially need to iterate over the rows, only calculating a single row at a time. This means any sort of vectorisation is not possible.
Polars offers map_elements
for this use-case, but it is discouraged. From the docs:
This method is much slower than the native expressions API. Only use it if you cannot implement your logic otherwise.
df = pl.DataFrame({'Index': [1, 0, 1, 1, 0]})
previous_result = "Result" # hardcode the header as the intial "previous result"
def f(index):
global previous_result
out = "A" if index == 0 else previous_result
previous_result = out
return out
print(df.with_columns(Result=pl.col("Index").map_elements(f, return_dtype=pl.String)))
# shape: (5, 2)
# ┌───────┬────────┐
# │ Index ┆ Result │
# │ --- ┆ --- │
# │ i64 ┆ str │
# ╞═══════╪════════╡
# │ 1 ┆ Result │
# │ 0 ┆ A │
# │ 1 ┆ A │
# │ 1 ┆ A │
# │ 0 ┆ A │
# └───────┴────────┘
The better solution is to attempt to recognise any pattern that allows the computation to be done in a vectorised way. In this (likely contrived) example, it is that once Index == 0
has been seen once, the result of the remainder of the column is "A"
df.with_columns(
# If the row number is >= to the first row an index of 0 was seen
pl.when(pl.int_range(pl.len()) >= pl.arg_where(pl.col("Index") == 0).min())
.then(pl.lit("A"))
.otherwise(pl.lit("Result"))
.alias("Result")
)
# Returns the same output as above