The netCDF files had several data arrays apart from time. So I first read time, associated time with name of the netCDF file it belongs to, and repartitioned the dataframe. Subsequently I added more columns using UDFs. This approach gave almost identical performance for the case where I had had 200,000 frames distributed evenly in either 4 or even 4000 files.