79512624

Date: 2025-03-16 13:44:25
Score: 1
Natty:
Report link

A comment: I spent a long time on this with a different code, see https://community.intel.com/t5/Intel-MPI-Library/Crash-using-impi/m-p/1457035/highlight/true. I was/am using many more mpi per node e.g. 64-98. Intel was less than helpful, they denied that it could occur and refused to provide information on the Jenkins code.

My conclusion is that it is (similar to what you indicate) a reproducible intermittent bug in Intel impi. Changing which cluster I used sometimes I could make it work, or changing the MPI layout; in some cases I had 100% failure on a given cluster. I have not tried the MPI FABRICS approach, interesting.

Reasons:
  • Long answer (-0.5):
  • No code block (0.5):
  • Low reputation (1):
Posted by: Laurence D Marks