There isn't much you can do beyond upgrading to a larger GPU — that's ultimately the best solution. If upgrading isn't an option, you can try reducing the batch size, using gradient accumulation, enabling AMP (automatic mixed precision) training, calling torch.cuda.empty_cache()
to clear unused memory, or simplifying the model to reduce its size.