late answer, but just in case. your code is actually correct, but your are using the view camera (gl representation) which is required for rendering instead of the actual world_to_camera (openCV representation). So change this world_to_camera= p.linalg.inv(cam_pose.transformation_matrix).astype('float32')
to: world_to_camera = (cam_pose.transformation_matrix).astype('float32')