The visual and description in this blog post OP referred to on how the decoder blocks work is misleading.
The last encoder block does not produce K and V matrices.
Instead, the cross-attention layer L in each decoder block takes the output Z of the last encoder block and transforms them to K and V matrices using the Wk and Wv projection matrices that L has learned. Only the Q matrix in L is derived from the output of the previous (self-attention) layer. As the original paper states in section 3.2.3:
In "encoder-decoder attention" layers, the queries come from the previous decoder layer, and the memory keys and values come from the output of the encoder.
Also refer to Fig. 1 in the original paper.
Thus, each token fed to the decoder stack attends to all tokens fed to the encoder stack via the cross-attention layers and all previous tokens fed to the decoder stack via the (masked) self-attention layers.