Well I will answer my own question.
There is not one.
I looked through the repo, yes they use popc and popcount (CUDA and OpenCl) for the nearest_neighbour (yes that's how it is spelled). But it is not used anywhere else. So it is not implemented.
Now I have a few choices; use the custom kernel, fork their code and make my own, or abandon this folly and move on.
I will probably try the custom kernel. If it fails I will switch back to OpenCL and CUDA.