View Issue Details

IDProjectCategoryView StatusLast Update
0000865OpenFOAMBugpublic2013-05-23 09:43
Reporteruser591Assigned Touser21 
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
Platformamd64OSUbuntuOS Version12.10
Summary0000865: Agglomeration fails in parallel
DescriptionfaceAgglomeration often fails in parallel. At that, it works fine for some number of processes and does not work for others. There is also dependence of the decomposition methods and its parameters. If one decomposition method used it works but for another it may fail.

For example, standard tutorial
OpenFOAM-2.2.x\tutorials\heatTransfer\chtMultiRegionSimpleFoam\multiRegionHeaterRadiation works with 4, 8 processors, but fails with 16. (scotch decomposition method)
Steps To ReproduceRun multiRegionHeaterRadiation tutorial decomposed for 16 processors
TagsNo tags attached.

Activities

user591

2013-05-22 14:15

 

user650

2013-05-22 15:35

  ~0002234

Dear Dimoon
I obtain the same problem on different meshes and different solvers,
on a nava scale machine with a redhat and a own compilation of gcc 4.7.3

but with same meshes, faceAgglomerate works on a ubuntu 13.04!

Yours

user650

2013-05-23 04:10

  ~0002235

My output with FOAM_ABORT=1


$ /programs/OpenFoam/ThirdParty-2.2.0/platforms/linux64Gcc/openmpi-1.6.3/bin/mpirun -np 4 faceAgglomerate -dict constant/faceAgglomerateDict -parallel
/*---------------------------------------------------------------------------*\
| ========= | |
| \\ / F ield | OpenFOAM: The Open Source CFD Toolbox |
| \\ / O peration | Version: 2.2.x |
| \\ / A nd | Web: www.OpenFOAM.org |
| \\/ M anipulation | |
\*---------------------------------------------------------------------------*/
Build : 2.2.x-29ce554b9686
Exec : faceAgglomerate -dict constant/faceAgglomerateDict -parallel
Date : May 23 2013
Time : 07:08:56
Host : "titan1"
PID : 29370
Case : /worktmp/users/abastide/test_radiatif
nProcs : 4
Slaves :
3
(
"titan1.29371"
"titan1.29372"
"titan1.29373"
)

Pstream initialized with:
    floatTransfer : 0
    nProcsSimpleSum : 0
    commsType : nonBlocking
    polling iterations : 0
sigFpe : Enabling floating point exception trapping (FOAM_SIGFPE).
fileModificationChecking : Monitoring run-time modified files using timeStampMaster
allowSystemOperations : Disallowing user-supplied system call operations

// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
Create time

Create mesh for time = 0


Agglomerating patch : ENTREE

Agglomerating patch : SORTIE

Agglomerating patch : HOT
[2]
[2]
[2] --> FOAM FATAL IO ERROR:
[2] error in IOstream "IOstream" for operation operator>>(Istream&, List<T>&) : reading first token
[2]
[2] file: IOstream at line 0.
[2]
[2] From function IOstream::fatalCheck(const char*) const
[2] in file db/IOstreams/IOstreams/IOstream.C at line 114.
[2]
FOAM aborting (FOAM_ABORT set)
[2]
[2] #0 Foam::error::printStack(Foam::Ostream&)--------------------------------------------------------------------------
An MPI process has executed an operation involving a call to the
"fork()" system call to create a child process. Open MPI is currently
operating in a condition that could result in memory corruption or
other system errors; your MPI job may hang, crash, or produce silent
data corruption. The use of fork() (or system() or other calls that
create child processes) is strongly discouraged.

The process that invoked fork was:

  Local host: titan1 (PID 29372)
  MPI_COMM_WORLD rank: 2

If you are *absolutely sure* that your application will successfully
and correctly survive a call to fork(), you may disable this warning
by setting the mpi_warn_on_fork MCA parameter to 0.
--------------------------------------------------------------------------
 at ??:?
[2] #1 Foam::IOerror::abort() at ??:?
[2] #2 Foam::IOerror::exit(int) at ??:?
[2] #3 Foam::Istream& Foam::operator>><Foam::List<int> >(Foam::Istream&, Foam::List<Foam::List<int> >&) at ??:?
[2] #4 void Foam::Pstream::combineGather<Foam::List<Foam::List<int> >, Foam::UPstream::listEq>(Foam::List<Foam::UPstream::commsStruct> const&, Foam::List<Foam::List<int> >&, Foam::UPstream::listEq const&, int) at ??:?
[2] #5 void Foam::Pstream::exchange<Foam::DynamicList<char, 0u, 2u, 1u>, char>(Foam::List<Foam::DynamicList<char, 0u, 2u, 1u> > const&, Foam::List<Foam::DynamicList<char, 0u, 2u, 1u> >&, Foam::List<Foam::List<int> >&, int, bool) at ??:?
[2] #6 Foam::PstreamBuffers::finishedSends(bool) at ??:?
[2] #7 void Foam::syncTools::syncBoundaryFaceList<int, Foam::eqOp<int>, Foam::mapDistribute::transform>(Foam::polyMesh const&, Foam::UList<int>&, Foam::eqOp<int> const&, Foam::mapDistribute::transform const&) at ??:?
[2] #8 main at ??:?
[2] #9 __libc_start_main at ??:?
[2] #10 _start at ??:?
[titan1:29372] *** Process received signal ***
[titan1:29372] Signal: Aborted (6)
[titan1:29372] Signal code: (-6)
[titan1:29372] [ 0] /lib64/libc.so.6 [0x38a4c302d0]
[titan1:29372] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x38a4c30265]
[titan1:29372] [ 2] /lib64/libc.so.6(abort+0x110) [0x38a4c31d10]
[titan1:29372] [ 3] /labos/piment/abastide/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZN4Foam7IOerror5abortEv+0xe6) [0x2addd43d28c6]
[titan1:29372] [ 4] /labos/piment/abastide/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZN4Foam7IOerror4exitEi+0x150) [0x2addd43d2b60]
[titan1:29372] [ 5] faceAgglomerate(_ZN4FoamrsINS_4ListIiEEEERNS_7IstreamES4_RNS1_IT_EE+0x49) [0x418109]
[titan1:29372] [ 6] /labos/piment/abastide/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZN4Foam7Pstream13combineGatherINS_4ListINS2_IiEEEENS_8UPstream6listEqEEEvRKNS2_INS5_11commsStructEEERT_RKT0_i+0xdb) [0x2addd446982b]
[titan1:29372] [ 7] /labos/piment/abastide/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZN4Foam7Pstream8exchangeINS_11DynamicListIcLj0ELj2ELj1EEEcEEvRKNS_4ListIT_EERS6_RNS4_INS4_IiEEEEib+0x1a2) [0x2addd446a0a2]
[titan1:29372] [ 8] /labos/piment/abastide/OpenFOAM/OpenFOAM-2.2.x/platforms/linux64GccDPOpt/lib/libOpenFOAM.so(_ZN4Foam14PstreamBuffers13finishedSendsEb+0x3b) [0x2addd44676cb]
[titan1:29372] [ 9] faceAgglomerate(_ZN4Foam9syncTools20syncBoundaryFaceListIiNS_4eqOpIiEENS_13mapDistribute9transformEEEvRKNS_8polyMeshERNS_5UListIT_EERKT0_RKT1_+0x702) [0x41ad02]
[titan1:29372] [10] faceAgglomerate [0x40c586]
[titan1:29372] [11] /lib64/libc.so.6(__libc_start_main+0xf4) [0x38a4c1d994]
[titan1:29372] [12] faceAgglomerate [0x40d151]
[titan1:29372] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 2 with PID 29372 on node titan1 exited on signal 6 (Aborted).
--------------------------------------------------------------------------

user21

2013-05-23 09:43

  ~0002237

It is fixed in commit 61aa86dfaf40266401c52a5aad8fb655950c02fa

Thanks

Issue History

Date Modified Username Field Change
2013-05-22 14:15 user591 New Issue
2013-05-22 14:15 user591 File Added: log.faceAgglomerate.bottomAir
2013-05-22 15:35 user650 Note Added: 0002234
2013-05-23 04:10 user650 Note Added: 0002235
2013-05-23 09:43 user21 Note Added: 0002237
2013-05-23 09:43 user21 Status new => resolved
2013-05-23 09:43 user21 Resolution open => fixed
2013-05-23 09:43 user21 Assigned To => user21