View Issue Details

IDProjectCategoryView StatusLast Update
0003683OpenFOAMBugpublic2022-01-04 12:23
Reporterauhpc Assigned Tohenry  
PrioritynormalSeverityblockReproducibilityalways
Status closedResolutionno change required 
PlatformLinuxOSCentOSOS Version7.9.2009
Summary0003683: Communication Deadlock in MPPICFoam Parallel Solver
DescriptionOpenFOAM version 8 experiences what appears to be a communication deadlock in a scheduled send\receive operation.

The case in question attempts to solve a toy CFD problem evaluating airflow within a rectangular prism using parallel instances of MPPICFoam on decomposed input.

Multiple versions of OpenMPI in terms of both release (e.g. 4.0.3, 2.1.6), compiler (e.g Intel, gcc), bit layer transport (e.g. ucx, openib) in conjunction with multiple builds of OpenFOAM 8, have been attempted. Blocking vs. nonblocking communication and a number of mpirun command line tuning parameters (including varied world sizes) have also been attempted with no resolution.

To determine if the file-system was a factor, the case was run on both local and parallel (GPFS) storage. No difference in runtime behavior was observed when running on local vs. parallel storage.

Additionally, a number of case configuration values (e.g. mesh sizing, simulation times, etc.) without any effect.

For debugging purposes the simulation deltaT was adjusted from 1e-3 to 1.0 which greatly reduces the time to failure.
Steps To Reproduce1. Extract the attached tarball e.g. tar -xvzf of_auburn_deadlock_case.tar.gz
2. cd B-3-25667
3. decomposePar
4. mpirun -np 3 MPPICFoam -parallel
5. deadlock should occur within a minute or so

Additional InformationPlease see the attached files for a detailed report including gdb\strace\etc. logs.
TagsNo tags attached.

Activities

auhpc

2021-06-07 16:32

reporter  

auburn_openfoam_deadlock_report.txt (39,745 bytes)   
*********************************
OpenFOAM 8 Communication Deadlock 
Technical Report
*********************************

Technical Contact
Bradley Morgan
Office of Information Technology
Auburn University
bradley@auburn.edu

Research SME Contact
David Young
Chemical Engineering
College of Engineering
Auburn University
djy0006@auburn.edu

Summary
-------

OpenFOAM version 8 experiences what appears to be a communication deadlock in a scheduled send\receive operation.

The case in question attempts to solve a toy CFD problem evaluating airflow within a rectangular prism using parallel instances of MPPICFoam on decomposed input.

Multiple versions of OpenMPI in terms of both release (e.g. 4.0.3, 2.1.6), compiler (e.g Intel, gcc), bit layer transport (e.g. ucx, openib) in conjunction with multiple builds of OpenFOAM 8, have been attempted.  Blocking vs. nonblocking communication and a number of mpirun command line tuning parameters (including varied world sizes) have also been attempted with no resolution.

To determine if the file-system was a factor, the case was run on both local and parallel (GPFS) storage. No difference in runtime behavior was observed when running on local vs. parallel storage.

Additionally, a number of case configuration values (e.g. mesh sizing, simulation times, etc.) without any effect.

For debugging purposes the simulation deltaT was adjusted from 1e-3 to 1.0 which greatly reduces the time to failure.



HW Environment
--------------

Architecture: x86_64
Processor: 2x Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz
Cores \ Node: 48


Software Environment
--------------------

OS: CentOS Linux release 7.9.2009
Application: OpenFOAM 8
MPI: OpenMPI 4.0.3


MPI Build
---------

.. code-block:: bash

	$ ompi_info 

	                 Package: Open MPI ... Distribution
	                Open MPI: 4.0.3
	  Open MPI repo revision: v4.0.3
	   Open MPI release date: Mar 03, 2020
	                Open RTE: 4.0.3
	  Open RTE repo revision: v4.0.3
	   Open RTE release date: Mar 03, 2020
	                    OPAL: 4.0.3
	      OPAL repo revision: v4.0.3
	       OPAL release date: Mar 03, 2020
	                 MPI API: 3.1.0
	            Ident string: 4.0.3
	                  Prefix: /tools/openmpi-4.0.3/gcc/4.8.5/ucx
	 Configured architecture: x86_64-unknown-linux-gnu
	          Configure host: c20-login01
	           Configured by: hpcuser
	           Configured on: Wed Apr 14 10:40:09 CDT 2021
	          Configure host: c20-login01
	  Configure command line: '--prefix=/tools/openmpi-4.0.3/gcc/4.8.5/ucx'
	                          '--with-slurm'
	                Built by: hpcuser
	                Built on: Wed Apr 14 10:50:39 CDT 2021
	              Built host: c20-login01
	              C bindings: yes
	            C++ bindings: no
	             Fort mpif.h: yes (all)
	            Fort use mpi: yes (limited: overloading)
	       Fort use mpi size: deprecated-ompi-info-value
	        Fort use mpi_f08: no
	 Fort mpi_f08 compliance: The mpi_f08 module was not built
	  Fort mpi_f08 subarrays: no
	           Java bindings: no
	  Wrapper compiler rpath: runpath
	              C compiler: /usr/bin/gcc
	     C compiler absolute: 
	  C compiler family name: GNU
	      C compiler version: 4.8.5
	            C++ compiler: /usr/bin/g++
	   C++ compiler absolute: none
	           Fort compiler: /usr/bin/gfortran
	       Fort compiler abs: 
	         Fort ignore TKR: no
	   Fort 08 assumed shape: no
	      Fort optional args: no
	          Fort INTERFACE: yes
	    Fort ISO_FORTRAN_ENV: yes
	       Fort STORAGE_SIZE: no
	      Fort BIND(C) (all): no
	      Fort ISO_C_BINDING: yes
	 Fort SUBROUTINE BIND(C): no
	       Fort TYPE,BIND(C): no
	 Fort T,BIND(C,name="a"): no
	            Fort PRIVATE: no
	          Fort PROTECTED: no
	           Fort ABSTRACT: no
	       Fort ASYNCHRONOUS: no
	          Fort PROCEDURE: no
	         Fort USE...ONLY: no
	           Fort C_FUNLOC: no
	 Fort f08 using wrappers: no
	         Fort MPI_SIZEOF: no
	             C profiling: yes
	           C++ profiling: no
	   Fort mpif.h profiling: yes
	  Fort use mpi profiling: yes
	   Fort use mpi_f08 prof: no
	          C++ exceptions: no
	          Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes,
	                          OMPI progress: no, ORTE progress: yes, Event lib:
	                          yes)
	           Sparse Groups: no
	  Internal debug support: no
	  MPI interface warnings: yes
	     MPI parameter check: runtime
	Memory profiling support: no
	Memory debugging support: no
	              dl support: yes
	   Heterogeneous support: no
	 mpirun default --prefix: no
	       MPI_WTIME support: native
	     Symbol vis. support: yes
	   Host topology support: yes
	            IPv6 support: no
	      MPI1 compatibility: no
	          MPI extensions: affinity, cuda, pcollreq
	   FT Checkpoint support: no (checkpoint thread: no)
	   C/R Enabled Debugging: no
	  MPI_MAX_PROCESSOR_NAME: 256
	    MPI_MAX_ERROR_STRING: 256
	     MPI_MAX_OBJECT_NAME: 64
	        MPI_MAX_INFO_KEY: 36
	        MPI_MAX_INFO_VAL: 256
	       MPI_MAX_PORT_NAME: 1024


OpenFOAM Case
-------------

/scratch/hpcadmn/djy0006/case5/B-3-22471/system/decomposeParDict
================================================================

/*--------------------------------*- C++ -*----------------------------------*\
  =========                 |
  \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox
   \\    /   O peration     | Website:  https://openfoam.org
    \\  /    A nd           | Version:  8
     \\/     M anipulation  |
\*---------------------------------------------------------------------------*/
FoamFile
{
    version     2.0;
    format      ascii;
    class       dictionary;
    location    "system";
    object      decomposeParDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

numberOfSubdomains 3;

method          simple;

simpleCoeffs
{
n    (3 1 1);
    delta           0.001;
}

// ************************************************************************* //


/scratch/hpcadmn/djy0006/case5/B-3-22471/system/controlDict
============================================================

/*--------------------------------*- C++ -*----------------------------------*\
  =========                 |
  \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox
   \\    /   O peration     | Website:  https://openfoam.org
    \\  /    A nd           | Version:  8
     \\/     M anipulation  |
\*---------------------------------------------------------------------------*/
FoamFile
{
    version     2.0;
    format      ascii;
    class       dictionary;
    location    "system";
    object      controlDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

application     MPPICFoam;
startFrom      startTime;
startTime       0.0;
stopAt          endTime;
endTime        6.5;
deltaT          1.0;
writeControl    timeStep;
writeInterval   1;
purgeWrite      0;
writeFormat     ascii;
writePrecision  6;
writeCompression off;
timeFormat      general;
timePrecision   6;
runTimeModifiable no;

// ************************************************************************* //

OptimisationSwitches {

    fileModificationSkew 60;
    fileModificationChecking timeStampMaster;
    fileHandler uncollated;
    maxThreadFileBufferSize 2e9;
    maxMasterFileBufferSize 2e9;
    commsType       blocking; // nonBlocking; // scheduled; // blocking;
    floatTransfer   0;
    nProcsSimpleSum 0;
	Force dumping (at next timestep) upon signal (-1 to disable)
    writeNowSignal             -1; // 10;
    stopAtWriteNowSignal       -1;
    inputSyntax dot;
    mpiBufferSize   200000000;
    maxCommsSize    0;
    trapFpe         1;
    setNaN          0;
}

DebugSwitches {
   UPstream            1;
   Pstream             1;
   processor           1;
   IFstream            1;
   OFstream            1;
}


Summary of Debug Output
-----------------------

The following debug output was generated using the above case configuration with an MPI_World size of 3...

$ srun -N1 -n3 --pty /bin/bash
...
$ module load openfoam/8-ompi2
$ source /tools/openfoam-8/mpich/OpenFOAM-8/etc/bashrc
$ decomposePar -force
$ mpirun -np $SLURM_NTASKS MPPICFoam -parallel

Following the process tree of ...

$ pstree -ac --show-parents -p -l 54148
systemd,1
  └─slurmstepd,262636
      └─bash,262643
          └─mpirun,54148 -np 3 MPPICFoam -parallel
              ├─MPPICFoam,54152 -parallel
              │   ├─{MPPICFoam},<tid>
              │   ├─{MPPICFoam},<tid>
              │   └─{MPPICFoam},<tid>
              ├─MPPICFoam,541523 -parallel
              │   ├─{MPPICFoam},<tid>
              │   ├─{MPPICFoam},<tid>
              │   └─{MPPICFoam},<tid>
              ├─MPPICFoam,54154 -parallel
              │   ├─{MPPICFoam},<tid>
              │   ├─{MPPICFoam},<tid>
              │   └─{MPPICFoam},<tid>
              ├─{mpirun},<tid>
              ├─{mpirun},<tid>
              └─{mpirun},<tid>


The case output at the time of failure looks like ...

[0] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0
[0] UPstream::waitRequests : finished wait.
[0] UIPstream::read : starting read from:1 tag:1 comm:0 wanted size:1 commsType:scheduled
[0] UIPstream::read : finished read from:1 tag:1 read size:1 commsType:scheduled
[0] UIPstream::read : starting read from:2 tag:1 comm:0 wanted size:1 commsType:scheduled
[0] UIPstream::read : finished read from:2 tag:1 read size:1 commsType:scheduled
[0] UOPstream::write : starting write to:2 tag:1 comm:0 size:1 commsType:scheduled
[0] UOPstream::write : finished write to:2 tag:1 size:1 commsType:scheduled
[0] UOPstream::write : starting write to:1 tag:1 comm:0 size:1 commsType:scheduled
[0] UOPstream::write : finished write to:1 tag:1 size:1 commsType:scheduled
[2] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0
[2] UPstream::waitRequests : finished wait.
[2] UOPstream::write : starting write to:0 tag:1 comm:0 size:1 commsType:scheduled
[2] UOPstream::write : finished write to:0 tag:1 size:1 commsType:scheduled
[2] UIPstream::read : starting read from:0 tag:1 comm:0 wanted size:1 commsType:scheduled
[2] UIPstream::read : finished read from:0 tag:1 read size:1 commsType:scheduled
[1] UPstream::waitRequests : starting wait for 0 outstanding requests starting at 0
[1] UPstream::waitRequests : finished wait.
[1] UOPstream::write : starting write to:0 tag:1 comm:0 size:1 commsType:scheduled
[1] UOPstream::write : finished write to:0 tag:1 size:1 commsType:scheduled
[1] UIPstream::read : starting read from:0 tag:1 comm:0 wanted size:1 commsType:scheduled
[1] UIPstream::read : finished read from:0 tag:1 read size:1 commsType:scheduled

<... freeze ...>

Here, the communication schedule seems to be balanced with all matching send and receives (based on size and tag).  However, the behavior indicates a blocking send or receive call.

The deadlock always seems to occur for size=1 send\recv operations.


*********** The remaining content consists of strace\gdb output from the MPI ranks.  ***************

In the root mpirun process (54152) looks like it is stuck in a poll loop.

Rank 0 appears to be issuing a return from Foam::BarycentricTensor<double>::BarycentricTensor().

Ranks 1 and 2 appear to be waiting on PMPI_Alltoall communication.

****************************************************************************************************


GDB
---

Root (MPI) Process
==================

[node040 B-3-22471]$ mpirun -np $SLURM_NTASKS MPPICFoam -parallel > /dev/null 2>&1 &
[1] 54148

[node040 B-3-22471]$ ps -ef | grep hpcuser
hpcuser   47219  47212  0 09:08 pts/0    00:00:00 /bin/bash
hpcuser   54148  47219  1 09:54 pts/0    00:00:00 mpirun -np 3 MPPICFoam -parallel
hpcuser   54152  54148 72 09:54 pts/0    00:00:02 MPPICFoam -parallel
hpcuser   54153  54148 81 09:54 pts/0    00:00:02 MPPICFoam -parallel
hpcuser   54154  54148 81 09:54 pts/0    00:00:02 MPPICFoam -parallel
hpcuser   54166  47219  0 09:54 pts/0    00:00:00 ps -ef
hpcuser   54167  47219  0 09:54 pts/0    00:00:00 grep --color=auto hpcuser

[node040 B-3-22471]$ gdb -p 54148
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7


(gdb) frame
#0  0x00002aaaac17fccd in poll () from /usr/lib64/libc.so.6
(gdb) where
#0  0x00002aaaac17fccd in poll () from /usr/lib64/libc.so.6
#1  0x00002aaaab096fc6 in poll_dispatch (base=0x659370, tv=0x0) at ../../../../../../../openmpi-4.0.3/opal/mca/event/libevent2022/libevent/poll.c:165
#2  0x00002aaaab08ec80 in opal_libevent2022_event_base_loop (base=0x659370, flags=1) at ../../../../../../../openmpi-4.0.3/opal/mca/event/libevent2022/libevent/event.c:1630
#3  0x0000000000401438 in orterun (argc=5, argv=0x7fffffffaae8) at ../../../../../openmpi-4.0.3/orte/tools/orterun/orterun.c:178
#4  0x0000000000400f6d in main (argc=5, argv=0x7fffffffaae8) at ../../../../../openmpi-4.0.3/orte/tools/orterun/main.c:13
(gdb) n
Single stepping until exit from function poll,
which has no line number information.
< ... freeze ... >


(gdb) disassemble 
Dump of assembler code for function poll:
   0x00002aaaac0ceca0 <+0>:	cmpl   $0x0,0x2d930d(%rip)        # 0x2aaaac3a7fb4 <__libc_multiple_threads>
   0x00002aaaac0ceca7 <+7>:	jne    0x2aaaac0cecb9 <poll+25>
   0x00002aaaac0ceca9 <+0>:	mov    $0x7,%eax
   0x00002aaaac0cecae <+5>:	syscall 
   0x00002aaaac0cecb0 <+7>:	cmp    $0xfffffffffffff001,%rax
   0x00002aaaac0cecb6 <+13>:	jae    0x2aaaac0cece9 <poll+73>
   0x00002aaaac0cecb8 <+15>:	retq   
   0x00002aaaac0cecb9 <+25>:	sub    $0x8,%rsp
   0x00002aaaac0cecbd <+29>:	callq  0x2aaaac0e7720 <__libc_enable_asynccancel>
   0x00002aaaac0cecc2 <+34>:	mov    %rax,(%rsp)
   0x00002aaaac0cecc6 <+38>:	mov    $0x7,%eax
   0x00002aaaac0ceccb <+43>:	syscall 
=> 0x00002aaaac0ceccd <+45>:	mov    (%rsp),%rdi
   0x00002aaaac0cecd1 <+49>:	mov    %rax,%rdx
   0x00002aaaac0cecd4 <+52>:	callq  0x2aaaac0e7780 <__libc_disable_asynccancel>
   0x00002aaaac0cecd9 <+57>:	mov    %rdx,%rax
   0x00002aaaac0cecdc <+60>:	add    $0x8,%rsp
   0x00002aaaac0cece0 <+64>:	cmp    $0xfffffffffffff001,%rax
   0x00002aaaac0cece6 <+70>:	jae    0x2aaaac0cece9 <poll+73>
   0x00002aaaac0cece8 <+72>:	retq   
   0x00002aaaac0cece9 <+73>:	mov    0x2d3160(%rip),%rcx        # 0x2aaaac3a1e50
   0x00002aaaac0cecf0 <+80>:	neg    %eax
   0x00002aaaac0cecf2 <+82>:	mov    %eax,%fs:(%rcx)
   0x00002aaaac0cecf5 <+85>:	or     $0xffffffffffffffff,%rax
   0x00002aaaac0cecf9 <+89>:	retq   
End of assembler dump.



Rank 0 Process
==============

[node040 B-3-22471]$ gdb -p 54152

GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-120.el7
Attaching to process 54152
Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/bin/MPPICFoam...done.
Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangian.so...done.
Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangian.so
Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianIntermediate.so...done.
Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianIntermediate.so
Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianTurbulence.so...done.
Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/liblagrangianTurbulence.so
Reading symbols from /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/libincompressibleTransportModels.so...done.
Loaded symbols for /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/platforms/linux64GccDPInt32Debug/lib/libincompressibleTransportModels.so
...
(gdb) disassemble
Dump of assembler code for function Foam::BarycentricTensor<double>::BarycentricTensor(Foam::Vector<double> const&, Foam::Vector<double> const&, Foam::Vector<double> const&, Foam::Vector<double> const&):
   0x00000000004aa9fc <+0>:	push   %rbp
   0x00000000004aa9fd <+1>:	mov    %rsp,%rbp
   0x00000000004aaa00 <+4>:	sub    $0x30,%rsp
   0x00000000004aaa04 <+8>:	mov    %rdi,-0x8(%rbp)
=> 0x00000000004aaa08 <+12>:	mov    %rsi,-0x10(%rbp)
   0x00000000004aaa0c <+16>:	mov    %rdx,-0x18(%rbp)
   0x00000000004aaa10 <+20>:	mov    %rcx,-0x20(%rbp)
   0x00000000004aaa14 <+24>:	mov    %r8,-0x28(%rbp)
   0x00000000004aaa18 <+28>:	mov    -0x8(%rbp),%rax
   0x00000000004aaa1c <+32>:	mov    %rax,%rdi
   0x00000000004aaa1f <+35>:	callq  0x4b9302 <Foam::MatrixSpace<Foam::BarycentricTensor<double>, double, (unsigned char)4, (unsigned char)3>::MatrixSpace()>
   0x00000000004aaa24 <+40>:	mov    -0x10(%rbp),%rax
   0x00000000004aaa28 <+44>:	mov    %rax,%rdi
   0x00000000004aaa2b <+47>:	callq  0x4a7af8 <Foam::Vector<double>::x() const>
   0x00000000004aaa30 <+52>:	mov    (%rax),%rax
   0x00000000004aaa33 <+55>:	mov    -0x8(%rbp),%rdx
   0x00000000004aaa37 <+59>:	mov    %rax,(%rdx)
   0x00000000004aaa3a <+62>:	mov    -0x18(%rbp),%rax
   0x00000000004aaa3e <+66>:	mov    %rax,%rdi
   0x00000000004aaa41 <+69>:	callq  0x4a7af8 <Foam::Vector<double>::x() const>
   0x00000000004aaa46 <+74>:	mov    (%rax),%rax
   0x00000000004aaa49 <+77>:	mov    -0x8(%rbp),%rdx
   0x00000000004aaa4d <+81>:	mov    %rax,0x8(%rdx)
   0x00000000004aaa51 <+85>:	mov    -0x20(%rbp),%rax
   0x00000000004aaa55 <+89>:	mov    %rax,%rdi
   0x00000000004aaa58 <+92>:	callq  0x4a7af8 <Foam::Vector<double>::x() const>
   0x00000000004aaa5d <+97>:	mov    (%rax),%rax
   0x00000000004aaa60 <+100>:	mov    -0x8(%rbp),%rdx
   0x00000000004aaa64 <+104>:	mov    %rax,0x10(%rdx)
   0x00000000004aaa68 <+108>:	mov    -0x28(%rbp),%rax
   0x00000000004aaa6c <+112>:	mov    %rax,%rdi
   0x00000000004aaa6f <+115>:	callq  0x4a7af8 <Foam::Vector<double>::x() const>
   0x00000000004aaa74 <+120>:	mov    (%rax),%rax
   0x00000000004aaa77 <+123>:	mov    -0x8(%rbp),%rdx
   0x00000000004aaa7b <+127>:	mov    %rax,0x18(%rdx)
   0x00000000004aaa7f <+131>:	mov    -0x10(%rbp),%rax
   0x00000000004aaa83 <+135>:	mov    %rax,%rdi
   0x00000000004aaa86 <+138>:	callq  0x4a7b06 <Foam::Vector<double>::y() const>
   0x00000000004aaa8b <+143>:	mov    (%rax),%rax
   0x00000000004aaa8e <+146>:	mov    -0x8(%rbp),%rdx

(gdb) frame
#0  0x00000000004a7c6d in Foam::operator^<double> (v1=..., v2=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/VectorI.H:159

(gdb) frame
#0  0x00000000004a55a2 in Foam::tetIndices::faceTriIs (this=0x7fffffff4b10, mesh=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/tetIndicesI.H:86
86	    label facePtI = (tetPt() + faceBasePtI) % f.size();

#0  0x00002aaaaacfd654 in Foam::BarycentricTensor<double>::d (this=0x7fffffff4620) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:159
159	    return Vector<Cmpt>(this->v_[XD], this->v_[YD], this->v_[ZD]);

(gdb) where
#0  0x00000000004ceebd in Foam::Barycentric<double>::Barycentric (this=0x7fffffff4be0, va=@0x7fffffff4cc0: -0.13335, vb=@0x7fffffff4cc8: -0.13716, vc=@0x7fffffff4cd0: -0.13716, 
    vd=@0x7fffffff4cd8: -0.12953999999999999) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricI.H:50
#1  0x00000000004b97f5 in Foam::BarycentricTensor<double>::z (this=0x7fffffff4c80) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:131
#2  0x00000000004aae13 in Foam::operator&<double> (T=..., b=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/OpenFOAM/lnInclude/BarycentricTensorI.H:177
#3  0x00000000004a6a1e in Foam::particle::position (this=0x2240a40) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/particleI.H:280
#4  0x00002aaaaacf837d in Foam::particle::deviationFromMeshCentre (this=0x2240a40) at particle/particle.C:1036
#5  0x000000000051ac3a in Foam::KinematicParcel<Foam::particle>::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (
    this=0x2240a40, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicParcel.C:309
#6  0x000000000050acc2 in Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x2240a40, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICParcel.C:102
#7  0x00000000004f22f3 in Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:205
#8  0x00000000004f1e18 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > >::motion<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247
#9  0x00000000004da066 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > >::evolveCloud<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210
#10 0x00000000004c3497 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > >::solve<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114
#11 0x00000000004afc73 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > >::evolve (this=0x7fffffff7220)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169
#12 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109


Rank 1 Process
==============

[node040 B-3-22471]$ gdb -p 54153

(gdb) disassemble 
Dump of assembler code for function uct_rc_mlx5_iface_progress_cyclic:
   0x00002aaaca088020 <+0>:	push   %r15
   0x00002aaaca088022 <+2>:	push   %r14
   0x00002aaaca088024 <+4>:	push   %r13
   0x00002aaaca088026 <+6>:	push   %r12
   0x00002aaaca088028 <+8>:	push   %rbp
   0x00002aaaca088029 <+9>:	push   %rbx
   0x00002aaaca08802a <+10>:	mov    %rdi,%rbx
   0x00002aaaca08802d <+13>:	sub    $0x38,%rsp
   0x00002aaaca088031 <+17>:	movzwl 0x86c8(%rdi),%edx
   0x00002aaaca088038 <+24>:	movzwl 0x86c6(%rdi),%eax
   0x00002aaaca08803f <+31>:	imul   %edx,%eax
   0x00002aaaca088042 <+34>:	mov    0x86b0(%rdi),%rdx
   0x00002aaaca088049 <+41>:	cltq   
   0x00002aaaca08804b <+43>:	cmpw   $0x0,0x2(%rdx,%rax,1)
=> 0x00002aaaca088051 <+49>:	jne    0x2aaaca0887cd <uct_rc_mlx5_iface_progress_cyclic+1965>
   0x00002aaaca088057 <+55>:	mov    0x86f0(%rdi),%rax
   0x00002aaaca08805e <+62>:	mov    0x8708(%rdi),%edx
   0x00002aaaca088064 <+68>:	mov    0x870c(%rdi),%ecx
   0x00002aaaca08806a <+74>:	prefetcht0 (%rax)
   0x00002aaaca08806d <+77>:	mov    0x8700(%rdi),%eax
   0x00002aaaca088073 <+83>:	lea    -0x1(%rdx),%ebp
   0x00002aaaca088076 <+86>:	and    %eax,%ebp
   0x00002aaaca088078 <+88>:	shl    %cl,%ebp
   0x00002aaaca08807a <+90>:	add    0x86f8(%rdi),%rbp
   0x00002aaaca088081 <+97>:	movzbl 0x3f(%rbp),%ecx
   0x00002aaaca088085 <+101>:	mov    %ecx,%esi
   0x00002aaaca088087 <+103>:	and    $0x1,%esi
   0x00002aaaca08808a <+106>:	test   %eax,%edx
   0x00002aaaca08808c <+108>:	setne  %dl
   0x00002aaaca08808f <+111>:	cmp    %sil,%dl
   0x00002aaaca088092 <+114>:	jne    0x2aaaca088686 <uct_rc_mlx5_iface_progress_cyclic+1638>
   0x00002aaaca088098 <+120>:	test   %cl,%cl
   0x00002aaaca08809a <+122>:	js     0x2aaaca088679 <uct_rc_mlx5_iface_progress_cyclic+1625>
   0x00002aaaca0880a0 <+128>:	add    $0x1,%eax
   0x00002aaaca0880a3 <+131>:	mov    %eax,0x8700(%rdi)
   0x00002aaaca0880a9 <+137>:	movzwl 0x3c(%rbp),%r13d
   0x00002aaaca0880ae <+142>:	movzwl 0x86c6(%rdi),%edi
   0x00002aaaca0880b5 <+149>:	movzwl 0x86c8(%rbx),%ecx
   0x00002aaaca0880bc <+156>:	mov    0x86b0(%rbx),%rdx
   0x00002aaaca0880c3 <+163>:	mov    0x2c(%rbp),%r12d
   0x00002aaaca0880c7 <+167>:	mov    %r13d,%eax
   0x00002aaaca0880ca <+170>:	ror    $0x8,%ax

(gdb) frame
#0  0x00002aaac936812d in uct_mm_iface_progress (tl_iface=<optimized out>) at ../../../src/uct/sm/mm/base/mm_iface.c:365

(gdb) frame
#0  0x00002aaaca08848a in uct_rc_mlx5_iface_progress_cyclic (arg=<optimized out>) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:183
183	}

(gdb) where
#0  0x00002aaaca088484 in uct_rc_mlx5_iface_progress_cyclic (arg=<optimized out>) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:183
#1  0x00002aaac90b608a in ucs_callbackq_dispatch (cbq=<optimized out>) at /home/hpcuser/build/ucx/build/../src/ucs/datastruct/callbackq.h:211
#2  uct_worker_progress (worker=<optimized out>) at /home/hpcuser/build/ucx/build/../src/uct/api/uct.h:2592
#3  ucp_worker_progress (worker=0xb9f390) at ../../../src/ucp/core/ucp_worker.c:2530
#4  0x00002aaac8c6c6d7 in mca_pml_ucx_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/openmpi/mca_pml_ucx.so
#5  0x00002aaab91c780c in opal_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libopen-pal.so.40
#6  0x00002aaab85111bd in ompi_request_default_wait_all () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40
#7  0x00002aaab8565398 in ompi_coll_base_alltoall_intra_basic_linear () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40
#8  0x00002aaab85240d7 in PMPI_Alltoall () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40
#9  0x00002aaab2569953 in Foam::UPstream::allToAll (sendData=..., recvData=..., communicator=0) at UPstream.C:367
#10 0x00002aaab0a162c1 in Foam::Pstream::exchangeSizes<Foam::List<Foam::DynamicList<char, 0u, 2u, 1u> > > (sendBufs=..., recvSizes=..., comm=0) at db/IOstreams/Pstreams/exchange.C:158
#11 0x00002aaab0a15d0d in Foam::PstreamBuffers::finishedSends (this=0x7fffffff4fe0, recvSizes=..., block=true) at db/IOstreams/Pstreams/PstreamBuffers.C:106
#12 0x00000000004f2670 in Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:283
#13 0x00000000004f1e18 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > >::motion<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247
#14 0x00000000004da066 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > >::evolveCloud<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210
#15 0x00000000004c3497 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > >::solve<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114
#16 0x00000000004afc73 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > >::evolve (this=0x7fffffff7220)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169
#17 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109


Rank 2 Process
==============

[node040 B-3-22471]$ gdb -p 54154

(gdb) disassemble 
Dump of assembler code for function uct_mm_iface_progress:
   0x00002aaac9367f80 <+0>:	push   %r15
   0x00002aaac9367f82 <+2>:	push   %r14
   0x00002aaac9367f84 <+4>:	push   %r13
   0x00002aaac9367f86 <+6>:	push   %r12
   0x00002aaac9367f88 <+8>:	push   %rbp
   0x00002aaac9367f89 <+9>:	push   %rbx
   0x00002aaac9367f8a <+10>:	mov    %rdi,%rbx
   0x00002aaac9367f8d <+13>:	sub    $0x168,%rsp
   0x00002aaac9367f94 <+20>:	mov    0x558(%rdi),%esi
   0x00002aaac9367f9a <+26>:	movl   $0x0,0x50(%rsp)
   0x00002aaac9367fa2 <+34>:	test   %esi,%esi
   0x00002aaac9367fa4 <+36>:	je     0x2aaac9368527 <uct_mm_iface_progress+1447>
   0x00002aaac9367faa <+42>:	lea    0x598(%rbx),%r14
   0x00002aaac9367fb1 <+49>:	lea    0x560(%rbx),%r13
   0x00002aaac9367fb8 <+56>:	lea    0x60(%rsp),%r12
   0x00002aaac9367fbd <+61>:	mov    0x538(%rdi),%rdx
   0x00002aaac9367fc4 <+68>:	movabs $0x7fffffffffffffff,%rbp
   0x00002aaac9367fce <+78>:	xor    %edi,%edi
   0x00002aaac9367fd0 <+80>:	movzbl 0x548(%rbx),%ecx
   0x00002aaac9367fd7 <+87>:	mov    0x540(%rbx),%rax
   0x00002aaac9367fde <+94>:	movzbl (%rdx),%edx
   0x00002aaac9367fe1 <+97>:	shr    %cl,%rax
=> 0x00002aaac9367fe4 <+100>:	xor    %rdx,%rax
   0x00002aaac9367fe7 <+103>:	and    $0x1,%eax
   0x00002aaac9367fea <+106>:	jne    0x2aaac9368500 <uct_mm_iface_progress+1408>
   0x00002aaac9367ff0 <+112>:	mov    0x528(%rbx),%rdx
   0x00002aaac9367ff7 <+119>:	mov    (%rdx),%rdx
   0x00002aaac9367ffa <+122>:	and    %rbp,%rdx
   0x00002aaac9367ffd <+125>:	cmp    %rdx,0x540(%rbx)
   0x00002aaac9368004 <+132>:	ja     0x2aaac93682c8 <uct_mm_iface_progress+840>
   0x00002aaac936800a <+138>:	mov    0x538(%rbx),%r10
   0x00002aaac9368011 <+145>:	testb  $0x2,(%r10)
   0x00002aaac9368015 <+149>:	je     0x2aaac93681ef <uct_mm_iface_progress+623>
   0x00002aaac936801b <+155>:	mov    0x224f36(%rip),%r15        # 0x2aaac958cf58
   0x00002aaac9368022 <+162>:	cmpl   $0x7,(%r15)
   0x00002aaac9368026 <+166>:	ja     0x2aaac9368198 <uct_mm_iface_progress+536>
   0x00002aaac936802c <+172>:	lea    0x1c(%r10),%rsi
   0x00002aaac9368030 <+176>:	movzbl 0x1(%r10),%r9d
   0x00002aaac9368035 <+181>:	movzwl 0x2(%r10),%edx
   0x00002aaac936803a <+186>:	cmp    $0x1f,%r9b
   0x00002aaac936803e <+190>:	ja     0x2aaac9368164 <uct_mm_iface_progress+484>
   0x00002aaac9368044 <+196>:	movzbl %r9b,%r9d

(gdb) frame
#0  0x00002aaaca08823b in uct_ib_mlx5_get_cqe (cqe_index=51, cq=0xbf0778) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:13
13	    return UCS_PTR_BYTE_OFFSET(cq->cq_buf, ((cqe_index & (cq->cq_length - 1)) <<

(gdb) where
#0  0x00002aaaca08823b in uct_ib_mlx5_get_cqe (cqe_index=51, cq=0xbf0778) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:13
#1  uct_ib_mlx5_poll_cq (cq=0xbf0778, iface=0xbe8050) at /home/hpcuser/build/ucx/build/../src/uct/ib/mlx5/ib_mlx5.inl:73
#2  uct_rc_mlx5_iface_poll_tx (iface=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:140
#3  uct_rc_mlx5_iface_progress (flags=2, arg=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:177
#4  uct_rc_mlx5_iface_progress_cyclic (arg=0xbe8050) at ../../../../src/uct/ib/rc/accel/rc_mlx5_iface.c:182
#5  0x00002aaac90b608a in ucs_callbackq_dispatch (cbq=<optimized out>) at /home/hpcuser/build/ucx/build/../src/ucs/datastruct/callbackq.h:211
#6  uct_worker_progress (worker=<optimized out>) at /home/hpcuser/build/ucx/build/../src/uct/api/uct.h:2592
#7  ucp_worker_progress (worker=0xb9f3d0) at ../../../src/ucp/core/ucp_worker.c:2530
#8  0x00002aaac8c6c6d7 in mca_pml_ucx_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/openmpi/mca_pml_ucx.so
#9  0x00002aaab91c780c in opal_progress () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libopen-pal.so.40
#10 0x00002aaab85111bd in ompi_request_default_wait_all () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40
#11 0x00002aaab8565398 in ompi_coll_base_alltoall_intra_basic_linear () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40
#12 0x00002aaab85240d7 in PMPI_Alltoall () from /tools/openmpi-4.0.3/gcc/4.8.5/ucx/lib/libmpi.so.40
#13 0x00002aaab2569953 in Foam::UPstream::allToAll (sendData=..., recvData=..., communicator=0) at UPstream.C:367
#14 0x00002aaab0a162c1 in Foam::Pstream::exchangeSizes<Foam::List<Foam::DynamicList<char, 0u, 2u, 1u> > > (sendBufs=..., recvSizes=..., comm=0) at db/IOstreams/Pstreams/exchange.C:158
#15 0x00002aaab0a15d0d in Foam::PstreamBuffers::finishedSends (this=0x7fffffff4fe0, recvSizes=..., block=true) at db/IOstreams/Pstreams/PstreamBuffers.C:106
#16 0x00000000004f2670 in Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > >::move<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=..., trackTime=1) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/basic/lnInclude/Cloud.C:283
#17 0x00000000004f1e18 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > >::motion<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:247
#18 0x00000000004da066 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > >::evolveCloud<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:210
#19 0x00000000004c3497 in Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > >::solve<Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > > > (this=0x7fffffff7220, cloud=..., td=...) at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/KinematicCloud.C:114
#20 0x00000000004afc73 in Foam::MPPICCloud<Foam::KinematicCloud<Foam::Cloud<Foam::MPPICParcel<Foam::KinematicParcel<Foam::particle> > > > >::evolve (this=0x7fffffff7220)
    at /mmfs1/tools/openfoam-8/debug/OpenFOAM-8/src/lagrangian/intermediate/lnInclude/MPPICCloud.C:169
#21 0x000000000049e61e in main (argc=2, argv=0x7fffffffa258) at ../DPMFoam.C:109

strace
------

Root MPI
========

[node040 B-3-22471]$ strace -ff -p 54148
strace: Process 54148 attached with 4 threads
[pid 54151] select(19, [17 18], NULL, NULL, {tv_sec=3516, tv_usec=700411} <unfinished ...>
[pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=0, tv_usec=963883} <unfinished ...>
[pid 54149] epoll_wait(10,  <unfinished ...>
[pid 54148] restart_syscall(<... resuming interrupted restart_syscall ...> <unfinished ...>
[pid 54150] <... select resumed>)       = 0 (Timeout)
[pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout)
[pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout)
[pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout)
[pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout)
[pid 54150] select(16, [14 15], NULL, NULL, {tv_sec=2, tv_usec=0}) = 0 (Timeout)

Rank 0
======

[node040 B-3-22471]$ strace -ff -p 54152
strace: Process 54152 attached with 4 threads
[pid 54163] epoll_wait(27,  <unfinished ...>
[pid 54159] epoll_wait(11,  <unfinished ...>
[pid 54156] restart_syscall(<... resuming interrupted poll ...>

Rank 1
======

[node040 B-3-22471]$ strace -ff -p 54153
strace: Process 54153 attached with 4 threads
[pid 54161] epoll_wait(27,  <unfinished ...>
[pid 54160] epoll_wait(11,  <unfinished ...>
[pid 54157] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid 54153] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
....

Rank 2
======

[node040 B-3-22471]$ strace -ff -p 54154
strace: Process 54154 attached with 4 threads
[pid 54158] epoll_wait(11,  <unfinished ...>
[pid 54155] restart_syscall(<... resuming interrupted poll ...> <unfinished ...>
[pid 54162] epoll_wait(27,  <unfinished ...>
[pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
[pid 54154] poll([{fd=6, events=POLLIN}, {fd=16, events=POLLIN}], 2, 0) = 0 (Timeout)
...

UCX Debug Log
-------------

[node040 B-3-22471]$ tail -f /tmp/ucx.log 
[1621958056.506833] [node040:61982:0]        mm_iface.c:250  UCX  DATA  RX [10315] oi am_id 2 len 12 EGR_O tag fffff30000000003
[1621958056.506835] [node040:61982:0]     tag_match.inl:119  UCX  DATA  checking req 0xcf70c0 tag fffff30000000003/ffffffffffffffff with tag fffff30000000003
[1621958056.506838] [node040:61982:0]     tag_match.inl:121  UCX  REQ   matched received tag fffff30000000003 to req 0xcf70c0
[1621958056.506840] [node040:61982:0]       eager_rcv.c:25   UCX  REQ   found req 0xcf70c0
[1621958056.506842] [node040:61982:0]   ucp_request.inl:603  UCX  REQ   req 0xcf70c0: unpack recv_data req_len 4 data_len 4 offset 0 last: yes
[1621958056.506845] [node040:61982:0]   ucp_request.inl:205  UCX  REQ   completing receive request 0xcf70c0 (0xcf71d0) --e-cr- stag 0xfffff30000000003 len 4, Success
[1621958056.506847] [node040:61982:0]     ucp_request.c:80   UCX  REQ   free request 0xcf70c0 (0xcf71d0) d-e-cr-
[1621958056.506849] [node040:61982:0]   ucp_request.inl:181  UCX  REQ   put request 0xcf70c0
[1621958056.506867] [node040:61982:0]        tag_send.c:246  UCX  REQ   send_nbx buffer 0x7fffffff50af count 1 tag 10000100003 to <no debug data>
[1621958056.506870] [node040:61982:0]           mm_ep.c:289  UCX  DATA  TX [201] -i am_id 2 len 

< ... freeze ... >






henry

2021-06-07 17:25

manager   ~0012045

Try running the case in denseParticleFoam in the latest OpenFOAM-dev

tniemi

2021-06-08 12:01

reporter   ~0012046

I tested this a little bit and for me the deadlock happens at the very first particle update even in serial. I tried to vary the amount of injected parcels and with less amount of parcels the case can run for few iterations. Interestingly, the magic number seemed to be 4800, which is 3x1600, where 1600 is the amount of inlet faces. With 4799 or less the solver can manage the first iteration, with 4800 or more it hangs immediately.

Then I noticed, that the mesh is 2D, but with more than one cell in the thickness direction. If I changed the empty patches to walls, the case seems to run and finish regardless of the amount of parcels.

tniemi

2021-06-08 12:08

reporter   ~0012047

Edit, forget about the 4800, it can also hang with other amounts, like 4700. But changing the case to true 3D seems to help.

henry

2021-06-08 12:09

manager   ~0012048

Last edited: 2021-06-08 12:10

I ran the case without any problem is OpenFOAM-dev, however the case should NOT have empty patches if it not 2D.

tniemi

2021-06-08 12:55

reporter   ~0012049

For me it does seem to hang even with dev. I don't have a debug build at hand, but using gdb to see the rough location it seems that the code is stuck in the tracking functions like trackToStationaryTri, stationaryTetReverseTransform...

But as said, the mesh is not a true 2D nor 3D and maybe that is confusing the tracking. Also, almost all cells have a "small determinant (< 0.001)"

henry

2021-06-08 13:29

manager   ~0012050

Agreed, the case is incorrect.

auhpc

2021-06-08 18:12

reporter   ~0012052

Thank you all for your efforts and responses. I was able to get a build of the dev repo, and we're currently running tests there, but the insight on the 2D vs. 3D case configuration is extremely helpful. The PI has indicated this is likely the issue. We'll report back to confirm.

auhpc

2021-06-10 21:17

reporter   ~0012060

Thanks again for reviewing this issue and providing valuable feedback. The PI has confirmed that changing the case configuration as indicated solves the problem.

henry

2021-06-10 22:06

manager   ~0012061

User error.

Issue History

Date Modified Username Field Change
2021-06-07 16:32 auhpc New Issue
2021-06-07 16:32 auhpc File Added: auburn_openfoam_deadlock_report.txt
2021-06-07 16:32 auhpc File Added: of_auburn_report_p1.tar.gz
2021-06-07 16:32 auhpc File Added: of_auburn_deadlock_case.tar.gz
2021-06-07 17:25 henry Note Added: 0012045
2021-06-08 12:01 tniemi Note Added: 0012046
2021-06-08 12:08 tniemi Note Added: 0012047
2021-06-08 12:09 henry Note Added: 0012048
2021-06-08 12:10 henry Note Edited: 0012048
2021-06-08 12:55 tniemi Note Added: 0012049
2021-06-08 13:29 henry Note Added: 0012050
2021-06-08 18:12 auhpc Note Added: 0012052
2021-06-10 21:17 auhpc Note Added: 0012060
2021-06-10 22:06 henry Assigned To => henry
2021-06-10 22:06 henry Status new => closed
2021-06-10 22:06 henry Resolution open => no change required
2021-06-10 22:06 henry Note Added: 0012061