Please have a look at this similar issue for your reference. There are some difference observed on this MultiHeadAttention
layer implementation for the custom layer between Tensorflow 2.14
and TensorFlow 2.16
version.
However, you can refer to the Image captioning with visual attention example notebook for the MultiHeadAttention layer implementation in TransformerBlock.