View Issue Details

IDProjectCategoryView StatusLast Update
0003778OpenFOAMFeaturepublic2022-01-11 17:14
ReportertungliAssigned Tohenry 
PrioritynoneSeverityminorReproducibilityalways
Status closedResolutionno change required 
PlatformUnixOSManjaro Linux x86_64OS Version5.4.164-1
Product Version8 
Fixed in Version 
Summary0003778: `polyMesh::findCell` blocks execution when not called on every parallel process
Description`polyMesh` member function `findCell` may hang (block execution) of a solver. This seems to happen when I do not call `findCell` on every process. If I understand this correctly, this happens because of some synchronization done and runs into the issue of making sure that MPI calls are done on every process, otherwise the "master" process will wait until killed.

Interestingly, if `findCell` is called on every process before, this issue disappears (see attached "solver" code).

The backtrace (obtained by attaching with gdb to the processes) for processes that do call `findCell` is:
```
#0 0x00007f9f00901323 in mca_btl_smcuda_component_progress () from /usr/lib/openmpi/openmpi/mca_btl_smcuda.so
#1 0x00007f9f02523f5c in opal_progress () from /usr/lib/openmpi/libopen-pal.so.40
#2 0x00007f9f026ae916 in ompi_request_default_wait_all () from /usr/lib/openmpi/libmpi.so.40
#3 0x00007f9f02711ff8 in ompi_coll_base_alltoall_intra_linear_sync () from /usr/lib/openmpi/libmpi.so.40
#4 0x00007f9f002f2432 in ompi_coll_tuned_alltoall_intra_dec_fixed () from /usr/lib/openmpi/openmpi/mca_coll_tuned.so
#5 0x00007f9f026c031d in PMPI_Alltoall () from /usr/lib/openmpi/libmpi.so.40
#6 0x00007f9f0296dfe5 in Foam::UPstream::allToAll(Foam::UList<int> const&, Foam::UList<int>&, int) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/openmpi-system/libPstream.so
#7 0x00007f9f033d8d52 in void Foam::Pstream::exchangeSizes<Foam::UList<Foam::DynamicList<char, 0u, 2u, 1u> > >(Foam::UList<Foam::DynamicList<char, 0u, 2u, 1u> > const&, Foam::List<int>&, int) ()
   from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so
#8 0x00007f9f033d8946 in Foam::PstreamBuffers::finishedSends(bool) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so
#9 0x00007f9f035e37e3 in void Foam::syncTools::syncBoundaryFaceList<Foam::Vector<double>, Foam::eqOp<Foam::Vector<double> >, Foam::mapDistribute::transformPosition>(Foam::polyMesh const&, Foam::UList<Foam::Vector<double> >&, Foam::eqOp<Foam::Vector<double> > const&, Foam::mapDistribute::transformPosition const&, bool) [clone .isra.0] ()
   from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so
#10 0x00007f9f035e4e51 in Foam::polyMeshTetDecomposition::findFaceBasePts(Foam::polyMesh const&, double, bool) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so
#11 0x00007f9f035fdcfd in Foam::polyMesh::tetBasePtIs() const () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so
#12 0x00007f9f035ffe6d in Foam::polyMesh::findCell(Foam::Vector<double> const&, Foam::polyMesh::cellDecomposition) const ()
   from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so
#13 0x000055cccc799689 in main ()
```

For the processes that do not call `findCell`, the backtrace is:
```
#0 0x00007f8cc64bca95 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6
#1 0x00007f8cc64c1c77 in nanosleep () from /usr/lib/libc.so.6
#2 0x00007f8cc64eca99 in usleep () from /usr/lib/libc.so.6
#3 0x00007f8cc6090721 in ompi_mpi_finalize () from /usr/lib/openmpi/libmpi.so.40
#4 0x00007f8cc634e756 in Foam::UPstream::exit(int) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/openmpi-system/libPstream.so
#5 0x00007f8cc6c97c82 in Foam::ParRunControl::~ParRunControl() () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so
#6 0x00005639e4d6c62d in main ()
Steps To ReproduceDownload the attached zipped tar.
$ tar -xvzf findCellBugReport.tar.gz
$ cd findCellTest
$ wmake

I guess any case would work to reproduce this issue. I tried this on $(FOAM_TUTORIALS)incompressible/icoFoam/cavity/cavity. The decomposition dictionary is attached.

$ mpirun -np 3 findCellTest -parallel
TagsNo tags attached.

Activities

tungli

2022-01-11 11:39

reporter  

blockMeshDict (1,307 bytes)
/*--------------------------------*- C++ -*----------------------------------*\
  =========                 |
  \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox
   \\    /   O peration     | Website:  https://openfoam.org
    \\  /    A nd           | Version:  8
     \\/     M anipulation  |
\*---------------------------------------------------------------------------*/
FoamFile
{
    version     2.0;
    format      ascii;
    class       dictionary;
    object      blockMeshDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

convertToMeters 0.1;

vertices
(
    (0 0 0)
    (1 0 0)
    (1 1 0)
    (0 1 0)
    (0 0 0.1)
    (1 0 0.1)
    (1 1 0.1)
    (0 1 0.1)
);

blocks
(
    hex (0 1 2 3 4 5 6 7) (20 20 1) simpleGrading (1 1 1)
);

edges
(
);

boundary
(
    movingWall
    {
        type wall;
        faces
        (
            (3 7 6 2)
        );
    }
    fixedWalls
    {
        type wall;
        faces
        (
            (0 4 7 3)
            (2 6 5 1)
            (1 5 4 0)
        );
    }
    frontAndBack
    {
        type empty;
        faces
        (
            (0 3 2 1)
            (4 5 6 7)
        );
    }
);

mergePatchPairs
(
);

// ************************************************************************* //
blockMeshDict (1,307 bytes)
findCellBugReport.tar.gz (1,134 bytes)

henry

2022-01-11 12:13

manager   ~0012366

You need to call findCell on every processes as it is a parallel operation and not processor local.

tungli

2022-01-11 14:16

reporter   ~0012367

@henry Thanks for your answer.
Why is it parallel? I did not expect it to be parallel until I have seen the backtrace. How can you tell whether the function call is parallel or not from the documentation. Also, the mesh is decomposed - are the cell and face indices always global? The return label from `findCell` refers to which mesh - the decomposed or full / original (reconstructed)?

henry

2022-01-11 15:02

manager   ~0012368

//- Find cell enclosing this location and return index
            // If not found -1 is returned
            label findCell
            (
                const point& p,
                const cellDecomposition = CELL_TETS
            ) const;


Foam::label Foam::polyMesh::findCell
(
    const point& p,
    const cellDecomposition decompMode
) const
{
    if
    (
        Pstream::parRun()
     && (decompMode == FACE_DIAG_TRIS || decompMode == CELL_TETS)
    )
    {
        // Force construction of face-diagonal decomposition before testing
        // for zero cells.
        //
        // If parallel running a local domain might have zero cells so never
        // construct the face-diagonal decomposition which uses parallel
        // transfers.
        (void)tetBasePtIs();
    }

tungli

2022-01-11 16:33

reporter   ~0012369

Looking at the source code at https://cpp.openfoam.org/v8/polyMesh_8C_source.html#l00848 -- so the only reason that this happens is that we are not sure if the tetBasePtIsPtr_ in polyMesh is valid?
Once you call the tetBasePtIs member function of polyMesh (on every process) it cannot block execution on further calls.

Would it not be more reasonable to move the synchronization calls away from an otherwise processor-local function that findCell is (and the user expects it to be a local function -> it searches for a local index on a local mesh)? I find it very counter-intuitive and hard to debug for an average OF user.
If this is inconvenient then a documentation-level comment would be nice: "The user has to make sure to call this function (or the `tetBasePtIs` function) at least once on each process for each instance of polyMesh, otherwise this blocks execution."

tungli

2022-01-11 16:55

reporter   ~0012370

btw, how is this not a bug from your point of view? The documentation of this function is:
   //- Find cell enclosing this location and return index
   // If not found -1 is returned

Nothing here suggests that calling this would hang.

henry

2022-01-11 17:14

manager   ~0012371

The current findCell runs in parallel when FACE_DIAG_TRIS or CELL_TETS are used because the tetBasePtIs() have to be synchronised. If this mode of operation does not suit your purpose you could write and alternative implementation. So far the findCell in OpenFOAM has not caused any problems.

Issue History

Date Modified Username Field Change
2022-01-11 11:39 tungli New Issue
2022-01-11 11:39 tungli File Added: blockMeshDict
2022-01-11 11:39 tungli File Added: findCellBugReport.tar.gz
2022-01-11 12:13 henry Note Added: 0012366
2022-01-11 12:13 henry Priority normal => none
2022-01-11 12:13 henry Category Bug => Feature
2022-01-11 14:16 tungli Note Added: 0012367
2022-01-11 15:02 henry Note Added: 0012368
2022-01-11 16:33 tungli Note Added: 0012369
2022-01-11 16:55 tungli Note Added: 0012370
2022-01-11 17:14 henry Note Added: 0012371
2022-01-11 17:14 henry Assigned To => henry
2022-01-11 17:14 henry Status new => closed
2022-01-11 17:14 henry Resolution open => no change required