79552611

Date: 2025-04-03 10:13:30
Score: 2
Natty:
Report link

Sorry for posting the solution so late. And thanks to @j23 and @Gilles Gouaillardet helping me with the answer.

I found the OpenMPI documentation, which suggested using ompi_info --param btl tcp to search for TCP-related parameters.

$ ompi_info --param btl tcp
         MCA btl: tcp (MCA v2.1.0, API v3.3.0, Component v5.0.7)
     MCA btl tcp: ---------------------------------------------------
     MCA btl tcp: parameter "btl_tcp_if_include" (current value: "",
                  data source: default, level: 1 user/basic, type:
                  string)
                  Comma-delimited list of devices and/or CIDR
                  notation of networks to use for MPI communication
                  (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
                  with btl_tcp_if_exclude.
     MCA btl tcp: parameter "btl_tcp_if_exclude" (current value:
                  "127.0.0.1/8,sppp", data source: default, level: 1
                  user/basic, type: string)
                  Comma-delimited list of devices and/or CIDR
                  notation of networks to NOT use for MPI
                  communication -- all devices not matching these
                  specifications will be used (e.g.,
                  "eth0,192.168.0.0/16").  If set to a non-default
                  value, it is mutually exclusive with
                  btl_tcp_if_include.
     MCA btl tcp: parameter "btl_tcp_progress_thread" (current value:
                  "0", data source: default, level: 1 user/basic,
                  type: int)

In my case, my processes attempt to communicate with each other over any available network, including the inappropriate network docker0.

Adding --mca btl_tcp_if_include <proper network> or --mca btl_tcp_if_exclude docker0 both solved the problem.

Reasons:
  • Blacklisted phrase (0.5): thanks
  • RegEx Blacklisted phrase (0.5): Sorry for posting
  • Long answer (-1):
  • Has code block (-0.5):
  • User mentioned (1): @j23
  • User mentioned (0): @Gilles
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Leonel Chen