79628810

Date: 2025-05-19 13:10:40
Score: 2.5
Natty:
Report link

I hope this answer can help you.

  1. Verify GPUDirect RDMA Support:

    • Check if the kernel module nvidia-peermem is installed and loaded.

    • If it’s missing, you’ll need to install it using NVIDIA’s MOFED software stack.

  2. Test with Host (CPU) Memory First:

    • Before using GPU memory, test RDMA transfers using regular host memory.

    • This helps confirm that your RDMA setup and code are working correctly.

  3. Hardware Limitation:

    • Since your system shows a "NODE" connection, true GPUDirect RDMA is not possible in this configuration.

    • Unless you can physically move the GPU or NIC to a PCIe slot under the same root complex, you won't get direct GPU-to-GPU transfers.

  4. Current Behavior:

    • Your code likely performs an RDMA write, but the GPU memory on the receiver side isn’t updated because GPUDirect is not functional.

    • That’s why the receiver’s GPU buffer shows no change

If you have any further question please let me know.

BR,

Dolle

Reasons:
  • RegEx Blacklisted phrase (2.5): please let me know
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Low reputation (1):
Posted by: ferferfer