79601394

Date: 2025-05-01 04:59:26
Score: 1.5
Natty:
Report link

The netCDF files had several data arrays apart from time. So I first read time, associated time with name of the netCDF file it belongs to, and repartitioned the dataframe. Subsequently I added more columns using UDFs. This approach gave almost identical performance for the case where I had had 200,000 frames distributed evenly in either 4 or even 4000 files.

Reasons:
  • No code block (0.5):
  • Self-answer (0.5):
  • Single line (0.5):
Posted by: Quiescent