79301419

Date: 2024-12-22 17:46:01
Score: 0.5
Natty:
Report link

Maybe using a list and indices to access the data could be faster than the coalesce / when. I'll try tomorrow to see if there is any speed up for doing the below:

symbol_to_idx = dict(pl.DataFrame(list_symbols, schema=['symbol']).with_row_index().select('symbol', 'index').iter_rows())
(
    df.join(basket, on="symbol", how="left")
    # alternative #2 to the successive joins 
    .join(
        df_pivot.select(
            "date", 
            pl.concat_list(list_symbols).alias("price_to_drop")
        ),
        on="date",
        how="left",
    )
    .with_columns([col(f'symbol_{i}').replace(symbol_to_idx).alias(f'symbol_index_{i}_to_drop') for i in [1,2]])
    .with_columns(*[col('price_to_drop').list.get(col(f'symbol_index_{i}_to_drop')).alias(f'price_{i}') for i in [1,2]])
    .select(pl.exclude("^.*to_drop$"))
)
Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: randomcoder877