79438138

Date: 2025-02-14 01:37:24
Score: 1
Natty:
Report link

Experiencing the same problem.

I noticed that with pytorch backend the GPU memory is ~10x smaller, so I increased the batch size to be 16x, so the training speed is 16x faster. Now comparable to the TensorFlow backend (however, the GPU utilization is still low, ~3% vs ~30% with TF).

NOTE: increasing the batch size may affect training quality, which is yet to be compared.

I suspect batch size with pytorch has different semantics than the traditional Keras semantics. See here: https://discuss.pytorch.org/t/solved-pytorch-lstm-50x-slower-than-keras-tf-cudnnlstm/10043/8

Reasons:
  • No code block (0.5):
  • Low reputation (0.5):
Posted by: user873275