79606678

Date: 2025-05-05 09:44:04
Score: 1
Natty:
Report link

I noticed that with pandas==2.2.1 (and possibly other versions), the bad line error is not triggered unless engine='python' is explicitly set when reading the file.

Following the example provided by @sitting_duck, here's a minimal reproducible code:

import io
import pandas as pd

sim_csv = io.StringIO(
'''A,B,C
11,21,31
12,22,32
13,23,33,43  # Bad Line
14,24,34
15,25,35'''
)

Without engine and with on_bad_lines='error':

with pd.read_csv(sim_csv, chunksize=2, on_bad_lines='error') as reader:
    for chunk in reader:
        print(chunk)

    A   B   C
0  11  21  31
1  12  22  32
    A   B   C
2  13  23  33
3  14  24  34
    A   B   C
4  15  25  35

With engine='python' and with on_bad_lines='error':

sim_csv.seek(0)
with pd.read_csv(sim_csv, chunksize=2, engine='python', on_bad_lines='error') as reader:
    for chunk in reader:
        print(chunk)

    A   B   C
0  11  21  31
1  12  22  32
[...] pandas.errors.ParserError: Expected 3 fields in line 4, saw 4
Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • User mentioned (1): @sitting_duck
  • Low reputation (1):
Posted by: pommelador