View Issue Details

IDProjectCategoryView StatusLast Update
0003734OpenFOAMBugpublic2021-09-30 13:30
Reporterjherb Assigned To 
PrioritynoneSeverityminorReproducibilityrandom
Status newResolutionopen 
Platformamd64OSDebianOS Version10
Product Versiondev 
Summary0003734: rhoSimpleFoam running parallel sometimes crashes if configuration files (e.g. fvSolution) is modifed during runtime
DescriptionThis was tested with OpenFOAM versions 8 and dev (98686ae7604a5603f501893e2e20a8038638ec52).
The following compilers were used to compile OpenFOAM: gcc 8.2.0 and 8.3.0, icx (Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)).
The operating systems Debian 10 and Centos (centos-linux-release-8.3-1.2011.el8.noarch) were used.

On all these systems, rhoSimpleFoam crashes sporadically, if configuration files (e.g. controlDict, fvSolution; ...) are changed during runtime.

The crash seems to happen more often if OpenFOAM is compiled in debug configuration, but also happens in the optimized configuration.

OpenMPI versions: 2.1.1 and 4.1.0

Steps To ReproduceE.g source $FOAM_SRC/../etc/bashrc WM_COMPILER=Icx WM_COMPILE_OPTION=Debug

Use the tutorial $FOAM_TUTORIALS/compressible/rhoSimpleFoam/squareBend
Modify controlDict (probably not necessary):
--- /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/tutorials/compressible/rhoSimpleFoam/squareBend/system/controlDict 2021-09-23 23:37:05.000000000 +0200
+++ system/controlDict 2021-09-28 16:41:02.101949800 +0200
@@ -46,5 +46,15 @@

 runTimeModifiable true;

+DebugSwitches
+{
+ objectRegistry 1;
+}
+
+OptimisationSwitches
+{
+ fileHandler collated;
+}
+

Start rhoSimpleFoam:
mpirun -n 8 rhoSimpleFoam -parallel

In a second shell, modify/force re-read of a config file, e.g. by
while true; do sleep 0.1 ; touch system/fvSolution ; done

The output before/after the crash (Release Version):

[3] objectRegistry::readModifiedObjects() : region0 : Considering reading object thermophysicalTransport
[7] objectRegistry::readModifiedObjects() : region0 : Considering reading object Cp
[7] objectRegistry::readModifiedObjects() : region0 : Considering reading object Residuals<tensor>
[7] objectRegistry::readModifiedObjects() : region0 : Considering reading object thermophysicalTransport
[0] #0 [0] objectRegistry::readModifiedObjects() : region0 : Considering reading object Foam::error::printStack(Foam::Ostream&) at ??:?
[0] #1 Foam::sigSegv::sigHandler(int) at ??:?
[0] #2 ? in "/lib/x86_64-linux-gnu/libc.so.6"
[0] #3 Foam::OSstream::write(Foam::word const&) at ??:?
[0] #4 Foam::operator<<(Foam::Ostream&, Foam::word const&) at ??:?
[0] #5 Foam::objectRegistry::readModifiedObjects() at ??:?
[0] #6 Foam::objectRegistry::readIfModified() at ??:?
[0] #7 Foam::objectRegistry::readModifiedObjects() at ??:?
[0] #8 Foam::Time::loop() at ??:?
[0] #9 ? in "/home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam"
[0] #10 __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6"
[0] #11 ? in "/home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam"
[ciu-linux2019:39857] *** Process received signal ***
[ciu-linux2019:39857] Signal: Segmentation fault (11)
[ciu-linux2019:39857] Signal code: (-6)
[ciu-linux2019:39857] Failing at address: 0x3a9f00009bb1
[ciu-linux2019:39857] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7fd752d46840]
[ciu-linux2019:39857] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fd752d467bb]
[ciu-linux2019:39857] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7fd752d46840]
[ciu-linux2019:39857] [ 3] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam8OSstream5writeERKNS_4wordE+0x4)[0x7fd7535f1234]
[ciu-linux2019:39857] [ 4] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4FoamlsERNS_7OstreamERKNS_4wordE+0xa)[0x7fd7535d2a8a]
[ciu-linux2019:39857] [ 5] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0xa7)[0x7fd753654217]
[ciu-linux2019:39857] [ 6] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry14readIfModifiedEv+0x9)[0x7fd7536542b9]
[ciu-linux2019:39857] [ 7] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0x45)[0x7fd7536541b5]
[ciu-linux2019:39857] [ 8] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam4Time4loopEv+0xdd)[0x7fd753682a7d]
[ciu-linux2019:39857] [ 9] rhoSimpleFoam(+0x2da13)[0x564d1c40ea13]
[ciu-linux2019:39857] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fd752d3309b]
[ciu-linux2019:39857] [11] rhoSimpleFoam(+0x3038a)[0x564d1c41138a]
[ciu-linux2019:39857] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node ciu-linux2019 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------


The output for a debug build (without objectRegistry debug output):

regIOobject::readIfModified() :
    Re-reading object fvSolution from file "/GRS/sys/data/user/hej/OpenFOAM/hej-dev/run/squareBend/system/fvSolution"
[0] #0 Foam::error::printStack(Foam::Ostream&) at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OSspecific/POSIX/printStack.C:218
[0] #1 Foam::sigSegv::sigHandler(int) at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OSspecific/POSIX/signals/sigSegv.C:54
[0] #2 ? in "/lib64/libc.so.6"
[0] #3 Foam::objectRegistry::readModifiedObjects() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/objectRegistry/objectRegistry.C:516
[0] #4 Foam::objectRegistry::readIfModified() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/objectRegistry/objectRegistry.C:524
[0] #5 Foam::objectRegistry::readModifiedObjects() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/objectRegistry/objectRegistry.C:507
[0] #6 Foam::Time::readModifiedObjects() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/Time/TimeIO.C:247
[0] #7 Foam::Time::run() const at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/Time/Time.C:838
[0] #8 Foam::Time::loop() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/Time/Time.C:865
[0] #9 Foam::simpleControl::loop(Foam::Time&) at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/finiteVolume/cfdTools/general/solutionControl/simpleControl/simpleControl.C:87
[0] #10 ? at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/applications/solvers/compressible/rhoSimpleFoam/rhoSimpleFoam.C:61
[0] #11 __libc_start_main in "/lib64/libc.so.6"
[0] #12 ? at ??:?
[manitu1:3732249] *** Process received signal ***
[manitu1:3732249] Signal: Segmentation fault (11)
[manitu1:3732249] Signal code: (-6)
[manitu1:3732249] Failing at address: 0x36e00038f319
[manitu1:3732249] [ 0] /lib64/libc.so.6(+0x37880)[0x7f536e283880]
[manitu1:3732249] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f536e2837ff]
[manitu1:3732249] [ 2] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam7sigSegv10sigHandlerEi+0xb6)[0x7f53700e3c36]
[manitu1:3732249] [ 3] /lib64/libc.so.6(+0x37880)[0x7f536e283880]
[manitu1:3732249] [ 4] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0xd6)[0x7f536fe165a6]
[manitu1:3732249] [ 5] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry14readIfModifiedEv+0x15)[0x7f536fe165e5]
[manitu1:3732249] [ 6] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0xe2)[0x7f536fe165b2]
[manitu1:3732249] [ 7] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam4Time19readModifiedObjectsEv+0x107)[0x7f536fe41f77]
[manitu1:3732249] [ 8] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZNK4Foam4Time3runEv+0xd5)[0x7f536fe3da95]
[manitu1:3732249] [ 9] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam4Time4loopEv+0x20)[0x7f536fe3db50]
[manitu1:3732249] [10] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libfiniteVolume.so(_ZN4Foam13simpleControl4loopERNS_4TimeE+0x5f)[0x7f5372c5df9f]
[manitu1:3732249] [11] rhoSimpleFoam[0x459b0b]
[manitu1:3732249] [12] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f536e26f7b3]
[manitu1:3732249] [13] rhoSimpleFoam[0x4563de]
[manitu1:3732249] *** End of error message ***
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node manitu1 exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------

TagsNo tags attached.

Activities

henry

2021-09-28 16:21

manager   ~0012204

I am unable to reproduce this behaviour.

jherb

2021-09-28 16:37

reporter   ~0012205

Which versions (OpenFOAM, compiler, MPI) are you using? Which configuration?

In non-parallel mode it looks like the crash is not happening.

henry

2021-09-28 16:45

manager   ~0012206

OpenFOAM-dev, gcc-10.1.1, OpenMPI-2.1.1

jherb

2021-09-28 17:28

reporter   ~0012207

Sorry, but there seems to be no official gcc-10.1.1: https://ftp.gnu.org/gnu/gcc/

Only gcc-10.1.0

Is this a distrubtion specific compiler? Which distribution are you using?

henry

2021-09-28 17:39

manager   ~0012208

Last edited: 2021-09-28 17:40

OpenSuSE Tumbleweed

gcc --version
gcc (SUSE Linux) 10.1.1 20200507 [revision dd38686d9c810cecbaa80bb82ed91caaa58ad635]
Copyright (C) 2020 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

jherb

2021-09-28 22:01

reporter   ~0012209

I was able to reproduce this error on a AWS EC2 instance (t3.micro), using the official ubuntu/images/hvm-ssd/ubuntu-hirsute-21.04-amd64-server-20210928-9b889f11-a864-4343-9340-1b2042b8cd6c AMI and a fresh installation of OpenFOAM 9 following this instructions: https://openfoam.org/download/9-ubuntu/

Time = 9

GAMG: Solving for Ux, Initial residual = 0.02368, Final residual = 0.0012146, No Iterations 1
GAMG: Solving for Uy, Initial residual = 0.0320432, Final residual = 0.00210963, No Iterations 1
GAMG: Solving for Uz, Initial residual = 0.384409, Final residual = 0.0177306, No Iterations 1
GAMG: Solving for e, Initial residual = 0.180589, Final residual = 0.00958168, No Iterations 1
GAMG: Solving for p, Initial residual = 0.0857398, Final residual = 0.00562322, No Iterations 3
time step continuity errors : sum local = 108.246, global = -52.7635, cumulative = -750.706
GAMG: Solving for epsilon, Initial residual = 0.0271821, Final residual = 0.00177123, No Iterations 1
GAMG: Solving for k, Initial residual = 0.0710536, Final residual = 0.00435529, No Iterations 1
ExecutionTime = 12.27 s ClockTime = 13 s

regIOobject::readIfModified() :
    Re-reading object fvSolution from file "/home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/fvSolution"
[0] #0 Foam::error::printStack(Foam::Ostream&) at ??:?
[0] #1 Foam::sigSegv::sigHandler(int) at ??:?
[0] #2 ? in "/lib/x86_64-linux-gnu/libc.so.6"
[0] #3 Foam::objectRegistry::readModifiedObjects() at ??:?
[0] #4 Foam::objectRegistry::readIfModified() at ??:?
[0] #5 Foam::objectRegistry::readModifiedObjects() at ??:?
[0] #6 Foam::Time::loop() at ??:?
[0] #7 ? in "/opt/openfoam9/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam"
[0] #8 __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6"
[0] #9 ? in "/opt/openfoam9/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam"
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 0 on node ip-172-31-14-153 exited on signal 11 (Segmentation fault).

These are the changes, I applied to the squareBend tutorial:

/opt/openfoam9/tutorials/compressible/rhoSimpleFoam$ diff -uBbwr squareBend $FOAM_RUN/squareBend
diff -uBbwr squareBend/system/blockMeshDict /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/blockMeshDict
--- squareBend/system/blockMeshDict 2021-09-03 07:57:04.000000000 +0000
+++ /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/blockMeshDict 2021-09-28 20:53:36.842827919 +0000
@@ -50,7 +50,7 @@
 blocks
 (
     hex (0 1 11 10 2 3 13 12) inlet ( 20 20 20) simpleGrading (1 1 1)
- hex (4 5 15 14 6 7 17 16) outlet (200 20 20) simpleGrading (1 1 1)
+ hex (4 5 15 14 6 7 17 16) outlet (800 20 20) simpleGrading (1 1 1)

     hex (1 8 18 11 3 9 19 13) bend1 ( 30 20 20) simpleGrading (1 1 1)
     hex (5 9 19 15 7 8 18 17) bend2 ( 30 20 20) simpleGrading (1 1 1)
diff -uBbwr squareBend/system/controlDict /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/controlDict
--- squareBend/system/controlDict 2021-09-03 07:57:04.000000000 +0000
+++ /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/controlDict 2021-09-28 20:54:33.082838092 +0000
@@ -46,5 +46,10 @@

 runTimeModifiable true;

+OptimisationSwitches
+{
+ fileHandler collated;
+}
+

 // ************************************************************************* //
diff -uBbwr squareBend/system/decomposeParDict /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/decomposeParDict
--- squareBend/system/decomposeParDict 2021-09-03 07:57:04.000000000 +0000
+++ /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/decomposeParDict 2021-09-28 20:50:14.266791212 +0000
@@ -14,13 +14,13 @@
 }
 // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

-numberOfSubdomains 8;
+numberOfSubdomains 2;

-method hierarchical;
+method simple;

 simpleCoeffs
 {
- n (8 1 1);
+ n (2 1 1);
 }

 hierarchicalCoeffs

Here is the full bash history starting from the fresh AMI:

    1 sudo sh -c "wget -O - https://dl.openfoam.org/gpg.key | apt-key add -"
    2 sudo add-apt-repository http://dl.openfoam.org/ubuntu
    3 sudo apt-get update
    4 sudo apt-get -y install openfoam9
    5 source /opt/openfoam9/etc/bashrc
    6 mkdir -p $FOAM_RUN
    7 run
    8 cp -r $FOAM_TUTORIALS/compressible/rhoSimpleFoam/squareBand .
    9 tut
   10 cd compressible/rhoSimpleFoam/
   11 cp -r squareBend $FOAM_RUN/
   12 run
   13 cd squareBend/
   14 blockMesh
   15 vim system/decomposeParDict
   16 decomposePar
   17 vim system/decomposeParDict
   18 decomposePar
   19 mpirun -n 2 rhoSimpleFoam -parallel
   20 mpirun -n 2 --host localhost:2 rhoSimpleFoam -parallel
   21 vim system/blockMeshDict
   22 blockMesh
   23 rm -rf processor*
   24 vim system/controlDict
   25 l
   26 decomposePar
   27 mpirun -n 2 --host localhost:2 rhoSimpleFoam -parallel
   28 ls
   29 rm -rf processors2/ constant/polyMesh/
   30 ls
   31 tut
   32 cd compressible/rhoSimpleFoam/
   33 diff -uBbwr squareBend $FOAM_RUN/squareBend
   34 history

And this is the command in the second ssh shell:

while true ; do sleep 0.1 ; touch system/fvSolution ; echo -n '.' ; done

henry

2021-09-30 13:29

manager   ~0012210

I am unable to reproduce this problem. Could you analyse it and propose a patch to fix the issue you see?
Alternatively we would need funding to work on it further for you.

Issue History

Date Modified Username Field Change
2021-09-28 16:14 jherb New Issue
2021-09-28 16:21 henry Note Added: 0012204
2021-09-28 16:37 jherb Note Added: 0012205
2021-09-28 16:45 henry Note Added: 0012206
2021-09-28 17:28 jherb Note Added: 0012207
2021-09-28 17:39 henry Note Added: 0012208
2021-09-28 17:40 henry Note Edited: 0012208
2021-09-28 22:01 jherb Note Added: 0012209
2021-09-30 13:29 henry Note Added: 0012210
2021-09-30 13:30 henry Priority normal => none
2021-09-30 13:30 henry Severity major => minor