Without an easy to run MRE I can't confirm but just from reading:
The z_t object you are using in z_cu_from_z already has host memory allocated at the location in z_t.bits (from the init call on your first line). You are trying to allocate device memory to an address that already has host memory allocated to it.