This is a very interesting topic. Since mean pooling increases the lossy nature of inversion, I think a better approach would be to use a multi-agent model. The structure could look like this:
Agent 1: Take individual tokens and convert them into letters.
Agent 2: Compare those tokens with the mean-pooled vectors and rearrange them.
Agent 3: Assemble the rearranged letters into properly formatted data.
Agent 4: Paraphrase the reconstructed information to improve readability and semantic fidelity.
Agent 5: Provide the final output.