Reports

Experiencing the same problem.

I noticed that with pytorch backend the GPU memory is ~10x smaller, so I increased the batch size to be 16x, so the training speed is 16x faster. Now comparable to the TensorFlow backend (however, the GPU utilization is still low, ~3% vs ~30% with TF).

NOTE: increasing the batch size may affect training quality, which is yet to be compared.

I suspect batch size with pytorch has different semantics than the traditional Keras semantics. See here: https://discuss.pytorch.org/t/solved-pytorch-lstm-50x-slower-than-keras-tf-cudnnlstm/10043/8

79438138