Have you consired looking on x32ABI Architecture? It literally adresses this problem. It takes advantage of 64-bit instructions with 32-bit pointers to avoid memory waste (overhead)
https://en.wikipedia.org/wiki/X32_ABI