View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003778 | OpenFOAM | Feature | public | 2022-01-11 11:39 | 2022-01-11 17:14 |
Reporter | tungli | Assigned To | henry | ||
Priority | none | Severity | minor | Reproducibility | always |
Status | closed | Resolution | no change required | ||
Platform | Unix | OS | Manjaro Linux x86_64 | OS Version | 5.4.164-1 |
Summary | 0003778: `polyMesh::findCell` blocks execution when not called on every parallel process | ||||
Description | `polyMesh` member function `findCell` may hang (block execution) of a solver. This seems to happen when I do not call `findCell` on every process. If I understand this correctly, this happens because of some synchronization done and runs into the issue of making sure that MPI calls are done on every process, otherwise the "master" process will wait until killed. Interestingly, if `findCell` is called on every process before, this issue disappears (see attached "solver" code). The backtrace (obtained by attaching with gdb to the processes) for processes that do call `findCell` is: ``` #0 0x00007f9f00901323 in mca_btl_smcuda_component_progress () from /usr/lib/openmpi/openmpi/mca_btl_smcuda.so #1 0x00007f9f02523f5c in opal_progress () from /usr/lib/openmpi/libopen-pal.so.40 #2 0x00007f9f026ae916 in ompi_request_default_wait_all () from /usr/lib/openmpi/libmpi.so.40 #3 0x00007f9f02711ff8 in ompi_coll_base_alltoall_intra_linear_sync () from /usr/lib/openmpi/libmpi.so.40 #4 0x00007f9f002f2432 in ompi_coll_tuned_alltoall_intra_dec_fixed () from /usr/lib/openmpi/openmpi/mca_coll_tuned.so #5 0x00007f9f026c031d in PMPI_Alltoall () from /usr/lib/openmpi/libmpi.so.40 #6 0x00007f9f0296dfe5 in Foam::UPstream::allToAll(Foam::UList<int> const&, Foam::UList<int>&, int) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/openmpi-system/libPstream.so #7 0x00007f9f033d8d52 in void Foam::Pstream::exchangeSizes<Foam::UList<Foam::DynamicList<char, 0u, 2u, 1u> > >(Foam::UList<Foam::DynamicList<char, 0u, 2u, 1u> > const&, Foam::List<int>&, int) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so #8 0x00007f9f033d8946 in Foam::PstreamBuffers::finishedSends(bool) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so #9 0x00007f9f035e37e3 in void Foam::syncTools::syncBoundaryFaceList<Foam::Vector<double>, Foam::eqOp<Foam::Vector<double> >, Foam::mapDistribute::transformPosition>(Foam::polyMesh const&, Foam::UList<Foam::Vector<double> >&, Foam::eqOp<Foam::Vector<double> > const&, Foam::mapDistribute::transformPosition const&, bool) [clone .isra.0] () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so #10 0x00007f9f035e4e51 in Foam::polyMeshTetDecomposition::findFaceBasePts(Foam::polyMesh const&, double, bool) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so #11 0x00007f9f035fdcfd in Foam::polyMesh::tetBasePtIs() const () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so #12 0x00007f9f035ffe6d in Foam::polyMesh::findCell(Foam::Vector<double> const&, Foam::polyMesh::cellDecomposition) const () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so #13 0x000055cccc799689 in main () ``` For the processes that do not call `findCell`, the backtrace is: ``` #0 0x00007f8cc64bca95 in clock_nanosleep@GLIBC_2.2.5 () from /usr/lib/libc.so.6 #1 0x00007f8cc64c1c77 in nanosleep () from /usr/lib/libc.so.6 #2 0x00007f8cc64eca99 in usleep () from /usr/lib/libc.so.6 #3 0x00007f8cc6090721 in ompi_mpi_finalize () from /usr/lib/openmpi/libmpi.so.40 #4 0x00007f8cc634e756 in Foam::UPstream::exit(int) () from <OF>/platforms/linux64GccDPInt32Optv8/lib/openmpi-system/libPstream.so #5 0x00007f8cc6c97c82 in Foam::ParRunControl::~ParRunControl() () from <OF>/platforms/linux64GccDPInt32Optv8/lib/libOpenFOAM.so #6 0x00005639e4d6c62d in main () | ||||
Steps To Reproduce | Download the attached zipped tar. $ tar -xvzf findCellBugReport.tar.gz $ cd findCellTest $ wmake I guess any case would work to reproduce this issue. I tried this on $(FOAM_TUTORIALS)incompressible/icoFoam/cavity/cavity. The decomposition dictionary is attached. $ mpirun -np 3 findCellTest -parallel | ||||
Tags | No tags attached. | ||||
|
blockMeshDict (1,307 bytes)
/*--------------------------------*- C++ -*----------------------------------*\ ========= | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox \\ / O peration | Website: https://openfoam.org \\ / A nd | Version: 8 \\/ M anipulation | \*---------------------------------------------------------------------------*/ FoamFile { version 2.0; format ascii; class dictionary; object blockMeshDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // convertToMeters 0.1; vertices ( (0 0 0) (1 0 0) (1 1 0) (0 1 0) (0 0 0.1) (1 0 0.1) (1 1 0.1) (0 1 0.1) ); blocks ( hex (0 1 2 3 4 5 6 7) (20 20 1) simpleGrading (1 1 1) ); edges ( ); boundary ( movingWall { type wall; faces ( (3 7 6 2) ); } fixedWalls { type wall; faces ( (0 4 7 3) (2 6 5 1) (1 5 4 0) ); } frontAndBack { type empty; faces ( (0 3 2 1) (4 5 6 7) ); } ); mergePatchPairs ( ); // ************************************************************************* // |
|
You need to call findCell on every processes as it is a parallel operation and not processor local. |
|
@henry Thanks for your answer. Why is it parallel? I did not expect it to be parallel until I have seen the backtrace. How can you tell whether the function call is parallel or not from the documentation. Also, the mesh is decomposed - are the cell and face indices always global? The return label from `findCell` refers to which mesh - the decomposed or full / original (reconstructed)? |
|
//- Find cell enclosing this location and return index // If not found -1 is returned label findCell ( const point& p, const cellDecomposition = CELL_TETS ) const; Foam::label Foam::polyMesh::findCell ( const point& p, const cellDecomposition decompMode ) const { if ( Pstream::parRun() && (decompMode == FACE_DIAG_TRIS || decompMode == CELL_TETS) ) { // Force construction of face-diagonal decomposition before testing // for zero cells. // // If parallel running a local domain might have zero cells so never // construct the face-diagonal decomposition which uses parallel // transfers. (void)tetBasePtIs(); } |
|
Looking at the source code at https://cpp.openfoam.org/v8/polyMesh_8C_source.html#l00848 -- so the only reason that this happens is that we are not sure if the tetBasePtIsPtr_ in polyMesh is valid? Once you call the tetBasePtIs member function of polyMesh (on every process) it cannot block execution on further calls. Would it not be more reasonable to move the synchronization calls away from an otherwise processor-local function that findCell is (and the user expects it to be a local function -> it searches for a local index on a local mesh)? I find it very counter-intuitive and hard to debug for an average OF user. If this is inconvenient then a documentation-level comment would be nice: "The user has to make sure to call this function (or the `tetBasePtIs` function) at least once on each process for each instance of polyMesh, otherwise this blocks execution." |
|
btw, how is this not a bug from your point of view? The documentation of this function is: //- Find cell enclosing this location and return index // If not found -1 is returned Nothing here suggests that calling this would hang. |
|
The current findCell runs in parallel when FACE_DIAG_TRIS or CELL_TETS are used because the tetBasePtIs() have to be synchronised. If this mode of operation does not suit your purpose you could write and alternative implementation. So far the findCell in OpenFOAM has not caused any problems. |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-01-11 11:39 | tungli | New Issue | |
2022-01-11 11:39 | tungli | File Added: blockMeshDict | |
2022-01-11 11:39 | tungli | File Added: findCellBugReport.tar.gz | |
2022-01-11 12:13 | henry | Note Added: 0012366 | |
2022-01-11 12:13 | henry | Priority | normal => none |
2022-01-11 12:13 | henry | Category | Bug => Feature |
2022-01-11 14:16 | tungli | Note Added: 0012367 | |
2022-01-11 15:02 | henry | Note Added: 0012368 | |
2022-01-11 16:33 | tungli | Note Added: 0012369 | |
2022-01-11 16:55 | tungli | Note Added: 0012370 | |
2022-01-11 17:14 | henry | Note Added: 0012371 | |
2022-01-11 17:14 | henry | Assigned To | => henry |
2022-01-11 17:14 | henry | Status | new => closed |
2022-01-11 17:14 | henry | Resolution | open => no change required |