Usually, data is read in batches to fine-tune the model.
For example, there is 1M of data, but I only use 64 non-overlapping data to fine-tune the model each time. After 15,625 iterations, the training of the entire data set can be completed.
Please refer to it, thanks