Although this issue has been a long time ago, I still have some experiences to share. Even when using the latest arm-none-eabi-gcc toolchain, fpv5-d16 and fpv5-sp-d16 can result in differences in the generated assembly code. When using fpv5-d16, some functions with strange construction and building methods may use double precision supported instructions such as vcvt.f64.s32. This instruction will cause the MCU that supports single precision FPU which I have chosen to enter HardFault. According to ARM's introduction, the FPU carried by Cortex-M7 core supports single and double precision options, so it is necessary to confirm the type of FPU from the reference document of the selected CPU.
Of course, I am also curious about how this compilation option works on clang. It seems that clang only has options for fpv5-d16. I haven't completed the compilation process using clang yet, but I suspect that clang has optimized the code generation for FPU.