79771121

Date: 2025-09-21 23:06:57
Score: 1
Natty:
Report link

From my several experiences using Ploars, I can categorically say that Polars will not load the entire Parquet into memory if you only select a few columns but it does column pruning under the hood.

But in another case when you use .collect() without care, Polars will try to materialize the entire rows from those column at once and the only implication is that it can destroy RAM on huge data.

In cases where one need to work on a very larg datasets, kindly use the following

·       Use Scan_parquet (Lazy mode) with filters before.collect().

·       Use streaming =True as you did, but combine with filters/agregations so Polars does not need to hold everything together.

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: user30818063