can you try by explicitly setting an engine while reading to DataFrame from S3 Path.
Maybe underlying engine could be the issue, again not sure...
df = pd.read_parquet(
f"s3a://{bucket_and_prefix}", data
engine="fastparquet",
storage_options=
{
"key" : os.getenv("AWS_ACCESS_KEY_ID"),
"secret" : os.getenv("AWS_SECRET_ACCESS_KEY"),
"client_kwargs": {
'verify' : os.getenv('AWS_CA_BUNDLE'),
'endpoint_url': 'https://prd-data.company.com/'
} }
}
)
or switching between fastparquet or pyarrow might help. Please let me know if you get any fix for this..