********************************* OpenFOAM 8 Communication Deadlock Technical Report ********************************* Technical Contact Bradley Morgan Office of Information Technology Auburn University bradley@auburn.edu Research SME Contact David Young Chemical Engineering College of Engineering Auburn University djy0006@auburn.edu Summary ------- OpenFOAM version 8 experiences what appears to be a communication deadlock in a scheduled send\receive operation. The case in question attempts to solve a toy CFD problem evaluating airflow within a rectangular prism using parallel instances of MPPICFoam on decomposed input. Multiple versions of OpenMPI in terms of both release (e.g. 4.0.3, 2.1.6), compiler (e.g Intel, gcc), bit layer transport (e.g. ucx, openib) in conjunction with multiple builds of OpenFOAM 8, have been attempted. Blocking vs. nonblocking communication and a number of mpirun command line tuning parameters (including varied world sizes) have also been attempted with no resolution. To determine if the file-system was a factor, the case was run on both local and parallel (GPFS) storage. No difference in runtime behavior was observed when running on local vs. parallel storage. Additionally, a number of case configuration values (e.g. mesh sizing, simulation times, etc.) without any effect. For debugging purposes the simulation deltaT was adjusted from 1e-3 to 1.0 which greatly reduces the time to failure. HW Environment -------------- Architecture: x86_64 Processor: 2x Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz Cores \ Node: 48 Software Environment -------------------- OS: CentOS Linux release 7.9.2009 Application: OpenFOAM 8 MPI: OpenMPI 4.0.3 MPI Build --------- .. code-block:: bash $ ompi_info Package: Open MPI ... Distribution Open MPI: 4.0.3 Open MPI repo revision: v4.0.3 Open MPI release date: Mar 03, 2020 Open RTE: 4.0.3 Open RTE repo revision: v4.0.3 Open RTE release date: Mar 03, 2020 OPAL: 4.0.3 OPAL repo revision: v4.0.3 OPAL release date: Mar 03, 2020 MPI API: 3.1.0 Ident string: 4.0.3 Prefix: /tools/openmpi-4.0.3/gcc/4.8.5/ucx Configured architecture: x86_64-unknown-linux-gnu Configure host: c20-login01 Configured by: hpcuser Configured on: Wed Apr 14 10:40:09 CDT 2021 Configure host: c20-login01 Configure command line: '--prefix=/tools/openmpi-4.0.3/gcc/4.8.5/ucx' '--with-slurm' Built by: hpcuser Built on: Wed Apr 14 10:50:39 CDT 2021 Built host: c20-login01 C bindings: yes C++ bindings: no Fort mpif.h: yes (all) Fort use mpi: yes (limited: overloading) Fort use mpi size: deprecated-ompi-info-value Fort use mpi_f08: no Fort mpi_f08 compliance: The mpi_f08 module was not built Fort mpi_f08 subarrays: no Java bindings: no Wrapper compiler rpath: runpath C compiler: /usr/bin/gcc C compiler absolute: C compiler family name: GNU C compiler version: 4.8.5 C++ compiler: /usr/bin/g++ C++ compiler absolute: none Fort compiler: /usr/bin/gfortran Fort compiler abs: Fort ignore TKR: no Fort 08 assumed shape: no Fort optional args: no Fort INTERFACE: yes Fort ISO_FORTRAN_ENV: yes Fort STORAGE_SIZE: no Fort BIND(C) (all): no Fort ISO_C_BINDING: yes Fort SUBROUTINE BIND(C): no Fort TYPE,BIND(C): no Fort T,BIND(C,name="a"): no Fort PRIVATE: no Fort PROTECTED: no Fort ABSTRACT: no Fort ASYNCHRONOUS: no Fort PROCEDURE: no Fort USE...ONLY: no Fort C_FUNLOC: no Fort f08 using wrappers: no Fort MPI_SIZEOF: no C profiling: yes C++ profiling: no Fort mpif.h profiling: yes Fort use mpi profiling: yes Fort use mpi_f08 prof: no C++ exceptions: no Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, OMPI progress: no, ORTE progress: yes, Event lib: yes) Sparse Groups: no Internal debug support: no MPI interface warnings: yes MPI parameter check: runtime Memory profiling support: no Memory debugging support: no dl support: yes Heterogeneous support: no mpirun default --prefix: no MPI_WTIME support: native Symbol vis. support: yes Host topology support: yes IPv6 support: no MPI1 compatibility: no MPI extensions: affinity, cuda, pcollreq FT Checkpoint support: no (checkpoint thread: no) C/R Enabled Debugging: no MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 OpenFOAM Case ------------- /scratch/hpcadmn/djy0006/case5/B-3-22471/system/decomposeParDict ================================================================ /*--------------------------------*- C++ -*----------------------------------*\ ========= | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox \\ / O peration | Website: https://openfoam.org \\ / A nd | Version: 8 \\/ M anipulation | \*---------------------------------------------------------------------------*/ FoamFile { version 2.0; format ascii; class dictionary; location "system"; object decomposeParDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // numberOfSubdomains 3; method simple; simpleCoeffs { n (3 1 1); delta 0.001; } // ************************************************************************* // /scratch/hpcadmn/djy0006/case5/B-3-22471/system/controlDict ============================================================ /*--------------------------------*- C++ -*----------------------------------*\ ========= | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox \\ / O peration | Website: https://openfoam.org \\ / A nd | Version: 8 \\/ M anipulation | \*---------------------------------------------------------------------------*/ FoamFile { version 2.0; format ascii; class dictionary; location "system"; object controlDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // application MPPICFoam; startFrom startTime; startTime 0.0; stopAt endTime; endTime 6.5; deltaT 1.0; writeControl timeStep; writeInterval 1; purgeWrite 0; writeFormat ascii; writePrecision 6; writeCompression off; timeFormat general; timePrecision 6; runTimeModifiable no; // ************************************************************************* // OptimisationSwitches { fileModificationSkew 60; fileModificationChecking timeStampMaster; fileHandler uncollated; maxThreadFileBufferSize 2e9; maxMasterFileBufferSize 2e9; commsType blocking; // nonBlocking; // scheduled; // blocking; floatTransfer 0; nProcsSimpleSum 0; Force dumping (at next timestep) upon signal (-1 to disable) writeNowSignal -1; // 10; stopAtWriteNowSignal -1; inputSyntax dot; mpiBufferSize 200000000; maxCommsSize 0; trapFpe 1; setNaN 0; } DebugSwitches { UPstream 1; Pstream 1; processor 1; IFstream 1; OFstream 1; } Summary of Debug Output ----------------------- The following debug output was generated using the above case configuration with an MPI_World size of 3... $ srun -N1 -n3 --pty /bin/bash ... $ module load openfoam/8-ompi2 $ source /tools/openfoam-8/mpich/OpenFOAM-8/etc/bashrc $ decomposePar -force $ mpirun -np $SLURM_NTASKS MPPICFoam -parallel Following the process tree of ... $ pstree -ac --show-parents -p -l 54148 systemd,1 └─slurmstepd,262636 └─bash,262643 └─mpirun,54148 -np 3 MPPICFoam -parallel ├─MPPICFoam,54152 -parallel │ ├─{MPPICFoam}, │ ├─{MPPICFoam}, │ └─{MPPICFoam}, ├─MPPICFoam,541523 -parallel │ ├─{MPPICFoam}, │ ├─{MPPICFoam}, │ └─{MPPICFoam}, ├─MPPICFoam,54154 -parallel │ ├─{MPPICFoam}, │ ├─{MPPICFoam}, │ └─{MPPICFoam}, ├─{mpirun}, ├─{mpirun}, └─{mpirun}, The case output at the time of failure looks like ... [0] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0 [0] UPstream::waitRequests : finished wait. [0] UIPstream::read : starting read from:1 tag:1 comm:0 wanted size:1 commsType:scheduled [0] UIPstream::read : finished read from:1 tag:1 read size:1 commsType:scheduled [0] UIPstream::read : starting read from:2 tag:1 comm:0 wanted size:1 commsType:scheduled [0] UIPstream::read : finished read from:2 tag:1 read size:1 commsType:scheduled [0] UOPstream::write : starting write to:2 tag:1 comm:0 size:1 commsType:scheduled [0] UOPstream::write : finished write to:2 tag:1 size:1 commsType:scheduled [0] UOPstream::write : starting write to:1 tag:1 comm:0 size:1 commsType:scheduled [0] UOPstream::write : finished write to:1 tag:1 size:1 commsType:scheduled [2] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0 [2] UPstream::waitRequests : finished wait. [2] UOPstream::write : starting write to:0 tag:1 comm:0 size:1 commsType:scheduled [2] UOPstream::write : finished write to:0 tag:1 size:1 commsType:scheduled [2] UIPstream::read : starting read from:0 tag:1 comm:0 wanted size:1 commsType:scheduled [2] UIPstream::read : finished read from:0 tag:1 read size:1 commsType:scheduled [1] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0 [1] UPstream::waitRequests : finished wait. [1] UOPstream::write : starting write to:0 tag:1 comm:0 size:1 commsType:scheduled [1] UOPstream::write : finished write to:0 tag:1 size:1 commsType:scheduled [1] UIPstream::read : starting read from:0 tag:1 comm:0 wanted size:1 commsType:scheduled [1] UIPstream::read : finished read from:0 tag:1 read size:1 commsType:scheduled <... freeze ...> Here, the communication schedule seems to be balanced with all matching send and receives (based on size and tag). However, the behavior indicates a blocking send or receive call. The deadlock always seems to occur for size=1 send\recv operations. *********** The remaining content consists of strace\gdb output from the MPI ranks. *************** In the root mpirun process (54152) looks like it is stuck in a poll loop. Rank 0 appears to be issuing a return from Foam::BarycentricTensor::BarycentricTensor(). Ranks 1 and 2 appear to be waiting on PMPI_Alltoall communication. **************************************************************************************************** GDB --- Root (MPI) Process ================== [node040 B-3-22471]$ mpirun -np $SLURM_NTASKS MPPICFoam -parallel > /dev/null 2>&1 & [1] 54148 [node040 B-3-22471]$ ps -ef | grep hpcuser hpcuser 47219 47212 0 09:08 pts/0 00:00:00 /bin/bash hpcuser 54148 47219 1 09:54 pts/0 00:00:00 mpirun -np 3 MPPICFoam -parallel hpcuser 54152 54148 72 09:54 pts/0 00:00:02 MPPICFoam -parallel hpcuser 54153 54148 81 09:54 pts/0 00:00:02 MPPICFoam -parallel hpcuser 54154 54148 81 09:54 pts/0 00:00:02 MPPICFoam -parallel hpcuser 54166 47219 0 09:54 pts/0 00:00:00 ps -ef hpcuser 54167 47219 0 09:54 pts/0 00:00:00 grep --color=auto hpcuser [node040 B-3-22471]$ gdb -p 54148 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 (gdb) frame #0 0x00002aaaac17fccd in poll () from /usr/lib64/libc.so.6 (gdb) where #0 0x00002aaaac17fccd in poll () from /usr/lib64/libc.so.6 #1 0x00002aaaab096fc6 in poll_dispatch (base=0x659370, tv=0x0) at ../../../../../../../openmpi-4.0.3/opal/mca/event/libevent2022/libevent/poll.c:165 #2 0x00002aaaab08ec80 in opal_libevent2022_event_base_loop (base=0x659370, flags=1) at ../../../../../../../openmpi-4.0.3/opal/mca/event/libevent2022/libevent/event.c:1630 #3 0x0000000000401438 in orterun (argc=5, argv=0x7fffffffaae8) at ../../../../../openmpi-4.0.3/orte/tools/orterun/orterun.c:178 #4 0x0000000000400f6d in main (argc=5, argv=0x7fffffffaae8) at ../../../../../openmpi-4.0.3/orte/tools/orterun/main.c:13 (gdb) n Single stepping until exit from function poll, which has no line number information. < ... freeze ... > (gdb) disassemble Dump of assembler code for function poll: 0x00002aaaac0ceca0 <+0>: cmpl $0x0,0x2d930d(%rip) # 0x2aaaac3a7fb4 <__libc_multiple_threads> 0x00002aaaac0ceca7 <+7>: jne 0x2aaaac0cecb9 0x00002aaaac0ceca9 <+0>: mov $0x7,%eax 0x00002aaaac0cecae <+5>: syscall 0x00002aaaac0cecb0 <+7>: cmp $0xfffffffffffff001,%rax 0x00002aaaac0cecb6 <+13>: jae 0x2aaaac0cece9 0x00002aaaac0cecb8 <+15>: retq 0x00002aaaac0cecb9 <+25>: sub $0x8,%rsp 0x00002aaaac0cecbd <+29>: callq 0x2aaaac0e7720 <__libc_enable_asynccancel> 0x00002aaaac0cecc2 <+34>: mov %rax,(%rsp) 0x00002aaaac0cecc6 <+38>: mov $0x7,%eax 0x00002aaaac0ceccb <+43>: syscall => 0x00002aaaac0ceccd <+45>: mov (%rsp),%rdi 0x00002aaaac0cecd1 <+49>: mov %rax,%rdx 0x00002aaaac0cecd4 <+52>: callq 0x2aaaac0e7780 <__libc_disable_asynccancel> 0x00002aaaac0cecd9 <+57>: mov %rdx,%rax 0x00002aaaac0cecdc <+60>: add $0x8,%rsp 0x00002aaaac0cece0 <+64>: cmp $0xfffffffffffff001,%rax 0x00002aaaac0cece6 <+70>: jae 0x2aaaac0cece9 0x00002aaaac0cece8 <+72>: retq 0x00002aaaac0cece9 <+73>: mov 0x2d3160(%rip),%rcx # 0x2aaaac3a1e50 0x00002aaaac0cecf0 <+80>: neg %eax 0x00002aaaac0cecf2 <+82>: mov %eax,%fs:(%rcx) 0x00002aaaac0cecf5 <+85>: or $0xffffffffffffffff,%rax 0x00002aaaac0cecf9 <+89>: retq End of assembler dump. Rank 0 Process ============== [node040 B-3-22471]$ gdb -p 54152 GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7 Attaching to process 54152 Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/bin/MPPICFoam...done. Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangian.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangian.so Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianIntermediate.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianIntermediate.so Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianTurbulence.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianTurbulence.so Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/libincompressibleTransportModels.so...done. Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/libincompressibleTransportModels.so ... (gdb) disassemble Dump of assembler code for function Foam::BarycentricTensor::BarycentricTensor(Foam::Vector const&, Foam::Vector const&, Foam::Vector const&, Foam::Vector const&): 0x00000000004aa9fc <+0>: push %rbp 0x00000000004aa9fd <+1>: mov %rsp,%rbp 0x00000000004aaa00 <+4>: sub $0x30,%rsp 0x00000000004aaa04 <+8>: mov %rdi,-0x8(%rbp) => 0x00000000004aaa08 <+12>: mov %rsi,-0x10(%rbp) 0x00000000004aaa0c <+16>: mov %rdx,-0x18(%rbp) 0x00000000004aaa10 <+20>: mov %rcx,-0x20(%rbp) 0x00000000004aaa14 <+24>: mov %r8,-0x28(%rbp) 0x00000000004aaa18 <+28>: mov -0x8(%rbp),%rax 0x00000000004aaa1c <+32>: mov %rax,%rdi 0x00000000004aaa1f <+35>: callq 0x4b9302 , double, (unsigned char)4, (unsigned char)3>::MatrixSpace()> 0x00000000004aaa24 <+40>: mov -0x10(%rbp),%rax 0x00000000004aaa28 <+44>: mov %rax,%rdi 0x00000000004aaa2b <+47>: callq 0x4a7af8 ::x() const> 0x00000000004aaa30 <+52>: mov (%rax),%rax 0x00000000004aaa33 <+55>: mov -0x8(%rbp),%rdx 0x00000000004aaa37 <+59>: mov %rax,(%rdx) 0x00000000004aaa3a <+62>: mov -0x18(%rbp),%rax 0x00000000004aaa3e <+66>: mov %rax,%rdi 0x00000000004aaa41 <+69>: callq 0x4a7af8 ::x() const> 0x00000000004aaa46 <+74>: mov (%rax),%rax 0x00000000004aaa49 <+77>: mov -0x8(%rbp),%rdx 0x00000000004aaa4d <+81>: mov %rax,0x8(%rdx) 0x00000000004aaa51 <+85>: mov -0x20(%rbp),%rax 0x00000000004aaa55 <+89>: mov %rax,%rdi 0x00000000004aaa58 <+92>: callq 0x4a7af8 ::x() const> 0x00000000004aaa5d <+97>: mov (%rax),%rax 0x00000000004aaa60 <+100>: mov -0x8(%rbp),%rdx 0x00000000004aaa64 <+104>: mov %rax,0x10(%rdx) 0x00000000004aaa68 <+108>: mov -0x28(%rbp),%rax 0x00000000004aaa6c <+112>: mov %rax,%rdi 0x00000000004aaa6f <+115>: callq 0x4a7af8 ::x() const> 0x00000000004aaa74 <+120>: mov (%rax),%rax 0x00000000004aaa77 <+123>: mov -0x8(%rbp),%rdx 0x00000000004aaa7b <+127>: mov %rax,0x18(%rdx) 0x00000000004aaa7f <+131>: mov -0x10(%rbp),%rax 0x00000000004aaa83 <+135>: mov %rax,%rdi 0x00000000004aaa86 <+138>: callq 0x4a7b06 ::y() const> 0x00000000004aaa8b <+143>: mov (%rax),%rax 0x00000000004aaa8e <+146>: mov -0x8(%rbp),%rdx (gdb) frame #0 0x00000000004a7c6d in Foam::operator^ (v1=..., v2=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/VectorI.H:159 (gdb) frame #0 0x00000000004a55a2 in Foam::tetIndices::faceTriIs (this=0x7fffffff4b10, mesh=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/tetIndicesI.H:86 86 label facePtI = (tetPt() + faceBasePtI) % f.size(); #0 0x00002aaaaacfd654 in Foam::BarycentricTensor::d (this=0x7fffffff4620) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:159 159 return Vector(this->v_[XD], this->v_[YD], this->v_[ZD]); (gdb) where #0 0x00000000004ceebd in Foam::Barycentric::Barycentric (this=0x7fffffff4be0, va=@0x7fffffff4cc0: -0.13335, vb=@0x7fffffff4cc8: -0.13716, vc=@0x7fffffff4cd0: -0.13716, vd=@0x7fffffff4cd8: -0.12953999999999999) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricI.H:50 #1 0x00000000004b97f5 in Foam::BarycentricTensor::z (this=0x7fffffff4c80) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:131 #2 0x00000000004aae13 in Foam::operator& (T=..., b=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:177 #3 0x00000000004a6a1e in Foam::particle::position (this=0x2240a40) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/particleI.H:280 #4 0x00002aaaaacf837d in Foam::particle::deviationFromMeshCentre (this=0x2240a40) at particle/particle.C:1036 #5 0x000000000051ac3a in Foam::KinematicParcel::move > > > > > ( this=0x2240a40, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicParcel.C:309 #6 0x000000000050acc2 in Foam::MPPICParcel >::move > > > > > (this=0x2240a40, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICParcel.C:102 #7 0x00000000004f22f3 in Foam::Cloud > >::move > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:205 #8 0x00000000004f1e18 in Foam::MPPICCloud > > > >::motion > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247 #9 0x00000000004da066 in Foam::KinematicCloud > > >::evolveCloud > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210 #10 0x00000000004c3497 in Foam::KinematicCloud > > >::solve > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114 #11 0x00000000004afc73 in Foam::MPPICCloud > > > >::evolve (this=0x7fffffff7220) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169 #12 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109 Rank 1 Process ============== [node040 B-3-22471]$ gdb -p 54153 (gdb) disassemble Dump of assembler code for function uct_rc_mlx5_iface_progress_cyclic: 0x00002aaaca088020 <+0>: push %r15 0x00002aaaca088022 <+2>: push %r14 0x00002aaaca088024 <+4>: push %r13 0x00002aaaca088026 <+6>: push %r12 0x00002aaaca088028 <+8>: push %rbp 0x00002aaaca088029 <+9>: push %rbx 0x00002aaaca08802a <+10>: mov %rdi,%rbx 0x00002aaaca08802d <+13>: sub $0x38,%rsp 0x00002aaaca088031 <+17>: movzwl 0x86c8(%rdi),%edx 0x00002aaaca088038 <+24>: movzwl 0x86c6(%rdi),%eax 0x00002aaaca08803f <+31>: imul %edx,%eax 0x00002aaaca088042 <+34>: mov 0x86b0(%rdi),%rdx 0x00002aaaca088049 <+41>: cltq 0x00002aaaca08804b <+43>: cmpw $0x0,0x2(%rdx,%rax,1) => 0x00002aaaca088051 <+49>: jne 0x2aaaca0887cd 0x00002aaaca088057 <+55>: mov 0x86f0(%rdi),%rax 0x00002aaaca08805e <+62>: mov 0x8708(%rdi),%edx 0x00002aaaca088064 <+68>: mov 0x870c(%rdi),%ecx 0x00002aaaca08806a <+74>: prefetcht0 (%rax) 0x00002aaaca08806d <+77>: mov 0x8700(%rdi),%eax 0x00002aaaca088073 <+83>: lea -0x1(%rdx),%ebp 0x00002aaaca088076 <+86>: and %eax,%ebp 0x00002aaaca088078 <+88>: shl %cl,%ebp 0x00002aaaca08807a <+90>: add 0x86f8(%rdi),%rbp 0x00002aaaca088081 <+97>: movzbl 0x3f(%rbp),%ecx 0x00002aaaca088085 <+101>: mov %ecx,%esi 0x00002aaaca088087 <+103>: and $0x1,%esi 0x00002aaaca08808a <+106>: test %eax,%edx 0x00002aaaca08808c <+108>: setne %dl 0x00002aaaca08808f <+111>: cmp %sil,%dl 0x00002aaaca088092 <+114>: jne 0x2aaaca088686 0x00002aaaca088098 <+120>: test %cl,%cl 0x00002aaaca08809a <+122>: js 0x2aaaca088679 0x00002aaaca0880a0 <+128>: add $0x1,%eax 0x00002aaaca0880a3 <+131>: mov %eax,0x8700(%rdi) 0x00002aaaca0880a9 <+137>: movzwl 0x3c(%rbp),%r13d 0x00002aaaca0880ae <+142>: movzwl 0x86c6(%rdi),%edi 0x00002aaaca0880b5 <+149>: movzwl 0x86c8(%rbx),%ecx 0x00002aaaca0880bc <+156>: mov 0x86b0(%rbx),%rdx 0x00002aaaca0880c3 <+163>: mov 0x2c(%rbp),%r12d 0x00002aaaca0880c7 <+167>: mov %r13d,%eax 0x00002aaaca0880ca <+170>: ror $0x8,%ax (gdb) frame #0 0x00002aaac936812d in uct_mm_iface_progress (tl_iface=) at ../../../src/uct/sm/mm/base/mm_iface.c:365 (gdb) frame #0 0x00002aaaca08848a in uct_rc_mlx5_iface_progress_cyclic (arg=) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:183 183 } (gdb) where #0 0x00002aaaca088484 in uct_rc_mlx5_iface_progress_cyclic (arg=) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:183 #1 0x00002aaac90b608a in ucs_callbackq_dispatch (cbq=) at /home/hpcuser/build/ucx/build/../src/ucs/datastruct/callbackq.h:211 #2 uct_worker_progress (worker=) at /home/hpcuser/build/ucx/build/../src/uct/api/uct.h:2592 #3 ucp_worker_progress (worker=0xb9f390) at ../../../src/ucp/core/ucp_worker.c:2530 #4 0x00002aaac8c6c6d7 in mca_pml_ucx_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/openmpi/mca_pml_ucx.so #5 0x00002aaab91c780c in opal_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libopen-pal.so.40 #6 0x00002aaab85111bd in ompi_request_default_wait_all () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40 #7 0x00002aaab8565398 in ompi_coll_base_alltoall_intra_basic_linear () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40 #8 0x00002aaab85240d7 in PMPI_Alltoall () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40 #9 0x00002aaab2569953 in Foam::UPstream::allToAll (sendData=..., recvData=..., communicator=0) at UPstream.C:367 #10 0x00002aaab0a162c1 in Foam::Pstream::exchangeSizes > > (sendBufs=..., recvSizes=..., comm=0) at db/IOstreams/Pstreams/exchange.C:158 #11 0x00002aaab0a15d0d in Foam::PstreamBuffers::finishedSends (this=0x7fffffff4fe0, recvSizes=..., block=true) at db/IOstreams/Pstreams/PstreamBuffers.C:106 #12 0x00000000004f2670 in Foam::Cloud > >::move > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:283 #13 0x00000000004f1e18 in Foam::MPPICCloud > > > >::motion > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247 #14 0x00000000004da066 in Foam::KinematicCloud > > >::evolveCloud > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210 #15 0x00000000004c3497 in Foam::KinematicCloud > > >::solve > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114 #16 0x00000000004afc73 in Foam::MPPICCloud > > > >::evolve (this=0x7fffffff7220) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169 #17 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109 Rank 2 Process ============== [node040 B-3-22471]$ gdb -p 54154 (gdb) disassemble Dump of assembler code for function uct_mm_iface_progress: 0x00002aaac9367f80 <+0>: push %r15 0x00002aaac9367f82 <+2>: push %r14 0x00002aaac9367f84 <+4>: push %r13 0x00002aaac9367f86 <+6>: push %r12 0x00002aaac9367f88 <+8>: push %rbp 0x00002aaac9367f89 <+9>: push %rbx 0x00002aaac9367f8a <+10>: mov %rdi,%rbx 0x00002aaac9367f8d <+13>: sub $0x168,%rsp 0x00002aaac9367f94 <+20>: mov 0x558(%rdi),%esi 0x00002aaac9367f9a <+26>: movl $0x0,0x50(%rsp) 0x00002aaac9367fa2 <+34>: test %esi,%esi 0x00002aaac9367fa4 <+36>: je 0x2aaac9368527 0x00002aaac9367faa <+42>: lea 0x598(%rbx),%r14 0x00002aaac9367fb1 <+49>: lea 0x560(%rbx),%r13 0x00002aaac9367fb8 <+56>: lea 0x60(%rsp),%r12 0x00002aaac9367fbd <+61>: mov 0x538(%rdi),%rdx 0x00002aaac9367fc4 <+68>: movabs $0x7fffffffffffffff,%rbp 0x00002aaac9367fce <+78>: xor %edi,%edi 0x00002aaac9367fd0 <+80>: movzbl 0x548(%rbx),%ecx 0x00002aaac9367fd7 <+87>: mov 0x540(%rbx),%rax 0x00002aaac9367fde <+94>: movzbl (%rdx),%edx 0x00002aaac9367fe1 <+97>: shr %cl,%rax => 0x00002aaac9367fe4 <+100>: xor %rdx,%rax 0x00002aaac9367fe7 <+103>: and $0x1,%eax 0x00002aaac9367fea <+106>: jne 0x2aaac9368500 0x00002aaac9367ff0 <+112>: mov 0x528(%rbx),%rdx 0x00002aaac9367ff7 <+119>: mov (%rdx),%rdx 0x00002aaac9367ffa <+122>: and %rbp,%rdx 0x00002aaac9367ffd <+125>: cmp %rdx,0x540(%rbx) 0x00002aaac9368004 <+132>: ja 0x2aaac93682c8 0x00002aaac936800a <+138>: mov 0x538(%rbx),%r10 0x00002aaac9368011 <+145>: testb $0x2,(%r10) 0x00002aaac9368015 <+149>: je 0x2aaac93681ef 0x00002aaac936801b <+155>: mov 0x224f36(%rip),%r15 # 0x2aaac958cf58 0x00002aaac9368022 <+162>: cmpl $0x7,(%r15) 0x00002aaac9368026 <+166>: ja 0x2aaac9368198 0x00002aaac936802c <+172>: lea 0x1c(%r10),%rsi 0x00002aaac9368030 <+176>: movzbl 0x1(%r10),%r9d 0x00002aaac9368035 <+181>: movzwl 0x2(%r10),%edx 0x00002aaac936803a <+186>: cmp $0x1f,%r9b 0x00002aaac936803e <+190>: ja 0x2aaac9368164 0x00002aaac9368044 <+196>: movzbl %r9b,%r9d (gdb) frame #0 0x00002aaaca08823b in uct_ib_mlx5_get_cqe (cqe_index=51, cq=0xbf0778) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:13 13 return UCS_PTR_BYTE_OFFSET(cq->cq_buf, ((cqe_index & (cq->cq_length - 1)) << (gdb) where #0 0x00002aaaca08823b in uct_ib_mlx5_get_cqe (cqe_index=51, cq=0xbf0778) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:13 #1 uct_ib_mlx5_poll_cq (cq=0xbf0778, iface=0xbe8050) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:73 #2 uct_rc_mlx5_iface_poll_tx (iface=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:140 #3 uct_rc_mlx5_iface_progress (flags=2, arg=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:177 #4 uct_rc_mlx5_iface_progress_cyclic (arg=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:182 #5 0x00002aaac90b608a in ucs_callbackq_dispatch (cbq=) at /home/hpcuser/build/ucx/build/../src/ucs/datastruct/callbackq.h:211 #6 uct_worker_progress (worker=) at /home/hpcuser/build/ucx/build/../src/uct/api/uct.h:2592 #7 ucp_worker_progress (worker=0xb9f3d0) at ../../../src/ucp/core/ucp_worker.c:2530 #8 0x00002aaac8c6c6d7 in mca_pml_ucx_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/openmpi/mca_pml_ucx.so #9 0x00002aaab91c780c in opal_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libopen-pal.so.40 #10 0x00002aaab85111bd in ompi_request_default_wait_all () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40 #11 0x00002aaab8565398 in ompi_coll_base_alltoall_intra_basic_linear () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40 #12 0x00002aaab85240d7 in PMPI_Alltoall () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40 #13 0x00002aaab2569953 in Foam::UPstream::allToAll (sendData=..., recvData=..., communicator=0) at UPstream.C:367 #14 0x00002aaab0a162c1 in Foam::Pstream::exchangeSizes > > (sendBufs=..., recvSizes=..., comm=0) at db/IOstreams/Pstreams/exchange.C:158 #15 0x00002aaab0a15d0d in Foam::PstreamBuffers::finishedSends (this=0x7fffffff4fe0, recvSizes=..., block=true) at db/IOstreams/Pstreams/PstreamBuffers.C:106 #16 0x00000000004f2670 in Foam::Cloud > >::move > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:283 #17 0x00000000004f1e18 in Foam::MPPICCloud > > > >::motion > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247 #18 0x00000000004da066 in Foam::KinematicCloud > > >::evolveCloud > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210 #19 0x00000000004c3497 in Foam::KinematicCloud > > >::solve > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114 #20 0x00000000004afc73 in Foam::MPPICCloud > > > >::evolve (this=0x7fffffff7220) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169 #21 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109 strace ------ Root MPI ======== [node040 B-3-22471]$ strace -ff -p 54148 strace: Process 54148 attached with 4 threads [pid 54151] select(19, [17 18], NULL, NULL, {tv_sec=3516, tv_usec=700411} [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=0, tv_usec=963883} [pid 54149] epoll_wait(10, [pid 54148] restart_syscall(<... resuming interrupted restart_syscall ...> [pid 54150] <... select resumed>) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) [pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout) Rank 0 ====== [node040 B-3-22471]$ strace -ff -p 54152 strace: Process 54152 attached with 4 threads [pid 54163] epoll_wait(27, [pid 54159] epoll_wait(11, [pid 54156] restart_syscall(<... resuming interrupted poll ...> Rank 1 ====== [node040 B-3-22471]$ strace -ff -p 54153 strace: Process 54153 attached with 4 threads [pid 54161] epoll_wait(27, [pid 54160] epoll_wait(11, [pid 54157] restart_syscall(<... resuming interrupted poll ...> [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) .... Rank 2 ====== [node040 B-3-22471]$ strace -ff -p 54154 strace: Process 54154 attached with 4 threads [pid 54158] epoll_wait(11, [pid 54155] restart_syscall(<... resuming interrupted poll ...> [pid 54162] epoll_wait(27, [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) [pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout) ... UCX Debug Log ------------- [node040 B-3-22471]$ tail -f /tmp/ucx.log [1621958056.506833] [node040:61982:0] mm_iface.c:250 UCX DATA RX [10315] oi am_id 2 len 12 EGR_O tag fffff30000000003 [1621958056.506835] [node040:61982:0] tag_match.inl:119 UCX DATA checking req 0xcf70c0 tag fffff30000000003/ffffffffffffffff with tag fffff30000000003 [1621958056.506838] [node040:61982:0] tag_match.inl:121 UCX REQ matched received tag fffff30000000003 to req 0xcf70c0 [1621958056.506840] [node040:61982:0] eager_rcv.c:25 UCX REQ found req 0xcf70c0 [1621958056.506842] [node040:61982:0] ucp_request.inl:603 UCX REQ req 0xcf70c0: unpack recv_data req_len 4 data_len 4 offset 0 last: yes [1621958056.506845] [node040:61982:0] ucp_request.inl:205 UCX REQ completing receive request 0xcf70c0 (0xcf71d0) --e-cr- stag 0xfffff30000000003 len 4, Success [1621958056.506847] [node040:61982:0] ucp_request.c:80 UCX REQ free request 0xcf70c0 (0xcf71d0) d-e-cr- [1621958056.506849] [node040:61982:0] ucp_request.inl:181 UCX REQ put request 0xcf70c0 [1621958056.506867] [node040:61982:0] tag_send.c:246 UCX REQ send_nbx buffer 0x7fffffff50af count 1 tag 10000100003 to [1621958056.506870] [node040:61982:0] mm_ep.c:289 UCX DATA TX [201] -i am_id 2 len < ... freeze ... >