Remove .with_infer_schema_length(None)
I opened a ticket (https://github.com/pola-rs/polars/issues/21298) on the pola.rs GitHub repository and received the answer there. This I want to share here.
The most relevant comment: https://github.com/pola-rs/polars/issues/21298#issuecomment-2717919596
Interpretation of functionality for date/time inference parsing:
"infer schema" tries up to 188 string patterns, 1x of 2x
it does so on every field in in scope
it does so for every row up to infer_schema_length is in scope
if "infer_schema_length" is not set, it defaults to 100 rows
if set to None, it processes (every field in) every row
Note that the inference is not cheap, and can significantly impact the performance.
After I removed the lines .with_infer_schema_length(None)
from my rust code the performance was increased significantly, also for smaller files.
For reference, small file, unchanged, release-compiled with rustc 1.85.0 (4d91de4e4 2025-02-17)
================
CPU 132%
user 0.112
system 0.030
total 0.108
Changed, small file:
================
CPU 265%
user 0.036
system 0.038
total 0.028
Changed, big file:
================
CPU 497%
user 0.145
system 0.047
total 0.039