I got the answer here https://discuss.pytorch.org/t/multi-head-self-attention-in-transformer-is-permutation-invariant-or-equivariant-how-to-see-it-in-practice/221249/2
the correct evaluation is
torch.allclose(y0[1], y1[0], atol=1e-6)
wich evaluate as True