It may be due to several factors such as Kernel Launch Overhead, Memory Transfer Bottlenecks, Underutilisation of GPU Cores and Thread Divergence in GPUs. The time taken to run Kernels(functions that run on the GPU) introduces overhead, particularly for smaller workloads. If computational task isn't large enough, launch overhead outweighs the speed benefits. Also, moving data between RAM and VRAM is slow over the PCIe Bus. GPUs really shine when they can work entirely on VRAM without moving data back and forth👍