You can try using GRU instead of LSTM. It is less complex as compared to LSTM and can give comparably high accuracy most of the times. Try changing activation function with selu and kernel_initializer="lecunn_normal" for top notch normalization.
If you are using the pre-trained model try setting the trainable parameters to TRUE and reduce the learning rate to 0.0001 or 0.00001 as per weights of the pre-trained model.
If these doesn't work try manipulating the learning rate during the training, like increase the learning rate first then slow down and increase again like that. You have to means like hit and try. You can also change the activation function with elu, leakyRelu, PRELU, etc and try kernel initializer with he_normal.