View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003734 | OpenFOAM | Bug | public | 2021-09-28 16:14 | 2021-09-30 13:30 |
Reporter | jherb | Assigned To | |||
Priority | none | Severity | minor | Reproducibility | random |
Status | new | Resolution | open | ||
Platform | amd64 | OS | Debian | OS Version | 10 |
Product Version | dev | ||||
Summary | 0003734: rhoSimpleFoam running parallel sometimes crashes if configuration files (e.g. fvSolution) is modifed during runtime | ||||
Description | This was tested with OpenFOAM versions 8 and dev (98686ae7604a5603f501893e2e20a8038638ec52). The following compilers were used to compile OpenFOAM: gcc 8.2.0 and 8.3.0, icx (Intel(R) oneAPI DPC++ Compiler 2021.1.2 (2020.10.0.1214)). The operating systems Debian 10 and Centos (centos-linux-release-8.3-1.2011.el8.noarch) were used. On all these systems, rhoSimpleFoam crashes sporadically, if configuration files (e.g. controlDict, fvSolution; ...) are changed during runtime. The crash seems to happen more often if OpenFOAM is compiled in debug configuration, but also happens in the optimized configuration. OpenMPI versions: 2.1.1 and 4.1.0 | ||||
Steps To Reproduce | E.g source $FOAM_SRC/../etc/bashrc WM_COMPILER=Icx WM_COMPILE_OPTION=Debug Use the tutorial $FOAM_TUTORIALS/compressible/rhoSimpleFoam/squareBend Modify controlDict (probably not necessary): --- /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/tutorials/compressible/rhoSimpleFoam/squareBend/system/controlDict 2021-09-23 23:37:05.000000000 +0200 +++ system/controlDict 2021-09-28 16:41:02.101949800 +0200 @@ -46,5 +46,15 @@ runTimeModifiable true; +DebugSwitches +{ + objectRegistry 1; +} + +OptimisationSwitches +{ + fileHandler collated; +} + Start rhoSimpleFoam: mpirun -n 8 rhoSimpleFoam -parallel In a second shell, modify/force re-read of a config file, e.g. by while true; do sleep 0.1 ; touch system/fvSolution ; done The output before/after the crash (Release Version): [3] objectRegistry::readModifiedObjects() : region0 : Considering reading object thermophysicalTransport [7] objectRegistry::readModifiedObjects() : region0 : Considering reading object Cp [7] objectRegistry::readModifiedObjects() : region0 : Considering reading object Residuals<tensor> [7] objectRegistry::readModifiedObjects() : region0 : Considering reading object thermophysicalTransport [0] #0 [0] objectRegistry::readModifiedObjects() : region0 : Considering reading object Foam::error::printStack(Foam::Ostream&) at ??:? [0] #1 Foam::sigSegv::sigHandler(int) at ??:? [0] #2 ? in "/lib/x86_64-linux-gnu/libc.so.6" [0] #3 Foam::OSstream::write(Foam::word const&) at ??:? [0] #4 Foam::operator<<(Foam::Ostream&, Foam::word const&) at ??:? [0] #5 Foam::objectRegistry::readModifiedObjects() at ??:? [0] #6 Foam::objectRegistry::readIfModified() at ??:? [0] #7 Foam::objectRegistry::readModifiedObjects() at ??:? [0] #8 Foam::Time::loop() at ??:? [0] #9 ? in "/home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam" [0] #10 __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6" [0] #11 ? in "/home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam" [ciu-linux2019:39857] *** Process received signal *** [ciu-linux2019:39857] Signal: Segmentation fault (11) [ciu-linux2019:39857] Signal code: (-6) [ciu-linux2019:39857] Failing at address: 0x3a9f00009bb1 [ciu-linux2019:39857] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7fd752d46840] [ciu-linux2019:39857] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x10b)[0x7fd752d467bb] [ciu-linux2019:39857] [ 2] /lib/x86_64-linux-gnu/libc.so.6(+0x37840)[0x7fd752d46840] [ciu-linux2019:39857] [ 3] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam8OSstream5writeERKNS_4wordE+0x4)[0x7fd7535f1234] [ciu-linux2019:39857] [ 4] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4FoamlsERNS_7OstreamERKNS_4wordE+0xa)[0x7fd7535d2a8a] [ciu-linux2019:39857] [ 5] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0xa7)[0x7fd753654217] [ciu-linux2019:39857] [ 6] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry14readIfModifiedEv+0x9)[0x7fd7536542b9] [ciu-linux2019:39857] [ 7] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0x45)[0x7fd7536541b5] [ciu-linux2019:39857] [ 8] /home/jenkins/.jenkins/workspace/of_dev_pipe/OpenFOAM-dev/platforms/linux64GccDPInt32Opt/lib/libOpenFOAM.so(_ZN4Foam4Time4loopEv+0xdd)[0x7fd753682a7d] [ciu-linux2019:39857] [ 9] rhoSimpleFoam(+0x2da13)[0x564d1c40ea13] [ciu-linux2019:39857] [10] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb)[0x7fd752d3309b] [ciu-linux2019:39857] [11] rhoSimpleFoam(+0x3038a)[0x564d1c41138a] [ciu-linux2019:39857] *** End of error message *** -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node ciu-linux2019 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- The output for a debug build (without objectRegistry debug output): regIOobject::readIfModified() : Re-reading object fvSolution from file "/GRS/sys/data/user/hej/OpenFOAM/hej-dev/run/squareBend/system/fvSolution" [0] #0 Foam::error::printStack(Foam::Ostream&) at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OSspecific/POSIX/printStack.C:218 [0] #1 Foam::sigSegv::sigHandler(int) at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OSspecific/POSIX/signals/sigSegv.C:54 [0] #2 ? in "/lib64/libc.so.6" [0] #3 Foam::objectRegistry::readModifiedObjects() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/objectRegistry/objectRegistry.C:516 [0] #4 Foam::objectRegistry::readIfModified() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/objectRegistry/objectRegistry.C:524 [0] #5 Foam::objectRegistry::readModifiedObjects() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/objectRegistry/objectRegistry.C:507 [0] #6 Foam::Time::readModifiedObjects() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/Time/TimeIO.C:247 [0] #7 Foam::Time::run() const at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/Time/Time.C:838 [0] #8 Foam::Time::loop() at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/OpenFOAM/db/Time/Time.C:865 [0] #9 Foam::simpleControl::loop(Foam::Time&) at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/src/finiteVolume/cfdTools/general/solutionControl/simpleControl/simpleControl.C:87 [0] #10 ? at /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/applications/solvers/compressible/rhoSimpleFoam/rhoSimpleFoam.C:61 [0] #11 __libc_start_main in "/lib64/libc.so.6" [0] #12 ? at ??:? [manitu1:3732249] *** Process received signal *** [manitu1:3732249] Signal: Segmentation fault (11) [manitu1:3732249] Signal code: (-6) [manitu1:3732249] Failing at address: 0x36e00038f319 [manitu1:3732249] [ 0] /lib64/libc.so.6(+0x37880)[0x7f536e283880] [manitu1:3732249] [ 1] /lib64/libc.so.6(gsignal+0x10f)[0x7f536e2837ff] [manitu1:3732249] [ 2] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam7sigSegv10sigHandlerEi+0xb6)[0x7f53700e3c36] [manitu1:3732249] [ 3] /lib64/libc.so.6(+0x37880)[0x7f536e283880] [manitu1:3732249] [ 4] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0xd6)[0x7f536fe165a6] [manitu1:3732249] [ 5] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry14readIfModifiedEv+0x15)[0x7f536fe165e5] [manitu1:3732249] [ 6] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam14objectRegistry19readModifiedObjectsEv+0xe2)[0x7f536fe165b2] [manitu1:3732249] [ 7] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam4Time19readModifiedObjectsEv+0x107)[0x7f536fe41f77] [manitu1:3732249] [ 8] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZNK4Foam4Time3runEv+0xd5)[0x7f536fe3da95] [manitu1:3732249] [ 9] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libOpenFOAM.so(_ZN4Foam4Time4loopEv+0x20)[0x7f536fe3db50] [manitu1:3732249] [10] /GRS/sys/work/user/hej/OpenFOAM/OpenFOAM-dev/platforms/linux64IcxDPInt32Debug/lib/libfiniteVolume.so(_ZN4Foam13simpleControl4loopERNS_4TimeE+0x5f)[0x7f5372c5df9f] [manitu1:3732249] [11] rhoSimpleFoam[0x459b0b] [manitu1:3732249] [12] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7f536e26f7b3] [manitu1:3732249] [13] rhoSimpleFoam[0x4563de] [manitu1:3732249] *** End of error message *** -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node manitu1 exited on signal 11 (Segmentation fault). -------------------------------------------------------------------------- | ||||
Tags | No tags attached. | ||||
|
I am unable to reproduce this behaviour. |
|
Which versions (OpenFOAM, compiler, MPI) are you using? Which configuration? In non-parallel mode it looks like the crash is not happening. |
|
OpenFOAM-dev, gcc-10.1.1, OpenMPI-2.1.1 |
|
Sorry, but there seems to be no official gcc-10.1.1: https://ftp.gnu.org/gnu/gcc/ Only gcc-10.1.0 Is this a distrubtion specific compiler? Which distribution are you using? |
|
OpenSuSE Tumbleweed gcc --version gcc (SUSE Linux) 10.1.1 20200507 [revision dd38686d9c810cecbaa80bb82ed91caaa58ad635] Copyright (C) 2020 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
|
I was able to reproduce this error on a AWS EC2 instance (t3.micro), using the official ubuntu/images/hvm-ssd/ubuntu-hirsute-21.04-amd64-server-20210928-9b889f11-a864-4343-9340-1b2042b8cd6c AMI and a fresh installation of OpenFOAM 9 following this instructions: https://openfoam.org/download/9-ubuntu/ Time = 9 GAMG: Solving for Ux, Initial residual = 0.02368, Final residual = 0.0012146, No Iterations 1 GAMG: Solving for Uy, Initial residual = 0.0320432, Final residual = 0.00210963, No Iterations 1 GAMG: Solving for Uz, Initial residual = 0.384409, Final residual = 0.0177306, No Iterations 1 GAMG: Solving for e, Initial residual = 0.180589, Final residual = 0.00958168, No Iterations 1 GAMG: Solving for p, Initial residual = 0.0857398, Final residual = 0.00562322, No Iterations 3 time step continuity errors : sum local = 108.246, global = -52.7635, cumulative = -750.706 GAMG: Solving for epsilon, Initial residual = 0.0271821, Final residual = 0.00177123, No Iterations 1 GAMG: Solving for k, Initial residual = 0.0710536, Final residual = 0.00435529, No Iterations 1 ExecutionTime = 12.27 s ClockTime = 13 s regIOobject::readIfModified() : Re-reading object fvSolution from file "/home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/fvSolution" [0] #0 Foam::error::printStack(Foam::Ostream&) at ??:? [0] #1 Foam::sigSegv::sigHandler(int) at ??:? [0] #2 ? in "/lib/x86_64-linux-gnu/libc.so.6" [0] #3 Foam::objectRegistry::readModifiedObjects() at ??:? [0] #4 Foam::objectRegistry::readIfModified() at ??:? [0] #5 Foam::objectRegistry::readModifiedObjects() at ??:? [0] #6 Foam::Time::loop() at ??:? [0] #7 ? in "/opt/openfoam9/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam" [0] #8 __libc_start_main in "/lib/x86_64-linux-gnu/libc.so.6" [0] #9 ? in "/opt/openfoam9/platforms/linux64GccDPInt32Opt/bin/rhoSimpleFoam" -------------------------------------------------------------------------- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -------------------------------------------------------------------------- -------------------------------------------------------------------------- mpirun noticed that process rank 0 with PID 0 on node ip-172-31-14-153 exited on signal 11 (Segmentation fault). These are the changes, I applied to the squareBend tutorial: /opt/openfoam9/tutorials/compressible/rhoSimpleFoam$ diff -uBbwr squareBend $FOAM_RUN/squareBend diff -uBbwr squareBend/system/blockMeshDict /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/blockMeshDict --- squareBend/system/blockMeshDict 2021-09-03 07:57:04.000000000 +0000 +++ /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/blockMeshDict 2021-09-28 20:53:36.842827919 +0000 @@ -50,7 +50,7 @@ blocks ( hex (0 1 11 10 2 3 13 12) inlet ( 20 20 20) simpleGrading (1 1 1) - hex (4 5 15 14 6 7 17 16) outlet (200 20 20) simpleGrading (1 1 1) + hex (4 5 15 14 6 7 17 16) outlet (800 20 20) simpleGrading (1 1 1) hex (1 8 18 11 3 9 19 13) bend1 ( 30 20 20) simpleGrading (1 1 1) hex (5 9 19 15 7 8 18 17) bend2 ( 30 20 20) simpleGrading (1 1 1) diff -uBbwr squareBend/system/controlDict /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/controlDict --- squareBend/system/controlDict 2021-09-03 07:57:04.000000000 +0000 +++ /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/controlDict 2021-09-28 20:54:33.082838092 +0000 @@ -46,5 +46,10 @@ runTimeModifiable true; +OptimisationSwitches +{ + fileHandler collated; +} + // ************************************************************************* // diff -uBbwr squareBend/system/decomposeParDict /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/decomposeParDict --- squareBend/system/decomposeParDict 2021-09-03 07:57:04.000000000 +0000 +++ /home/ubuntu/OpenFOAM/ubuntu-9/run/squareBend/system/decomposeParDict 2021-09-28 20:50:14.266791212 +0000 @@ -14,13 +14,13 @@ } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // -numberOfSubdomains 8; +numberOfSubdomains 2; -method hierarchical; +method simple; simpleCoeffs { - n (8 1 1); + n (2 1 1); } hierarchicalCoeffs Here is the full bash history starting from the fresh AMI: 1 sudo sh -c "wget -O - https://dl.openfoam.org/gpg.key | apt-key add -" 2 sudo add-apt-repository http://dl.openfoam.org/ubuntu 3 sudo apt-get update 4 sudo apt-get -y install openfoam9 5 source /opt/openfoam9/etc/bashrc 6 mkdir -p $FOAM_RUN 7 run 8 cp -r $FOAM_TUTORIALS/compressible/rhoSimpleFoam/squareBand . 9 tut 10 cd compressible/rhoSimpleFoam/ 11 cp -r squareBend $FOAM_RUN/ 12 run 13 cd squareBend/ 14 blockMesh 15 vim system/decomposeParDict 16 decomposePar 17 vim system/decomposeParDict 18 decomposePar 19 mpirun -n 2 rhoSimpleFoam -parallel 20 mpirun -n 2 --host localhost:2 rhoSimpleFoam -parallel 21 vim system/blockMeshDict 22 blockMesh 23 rm -rf processor* 24 vim system/controlDict 25 l 26 decomposePar 27 mpirun -n 2 --host localhost:2 rhoSimpleFoam -parallel 28 ls 29 rm -rf processors2/ constant/polyMesh/ 30 ls 31 tut 32 cd compressible/rhoSimpleFoam/ 33 diff -uBbwr squareBend $FOAM_RUN/squareBend 34 history And this is the command in the second ssh shell: while true ; do sleep 0.1 ; touch system/fvSolution ; echo -n '.' ; done |
|
I am unable to reproduce this problem. Could you analyse it and propose a patch to fix the issue you see? Alternatively we would need funding to work on it further for you. |
Date Modified | Username | Field | Change |
---|---|---|---|
2021-09-28 16:14 | jherb | New Issue | |
2021-09-28 16:21 | henry | Note Added: 0012204 | |
2021-09-28 16:37 | jherb | Note Added: 0012205 | |
2021-09-28 16:45 | henry | Note Added: 0012206 | |
2021-09-28 17:28 | jherb | Note Added: 0012207 | |
2021-09-28 17:39 | henry | Note Added: 0012208 | |
2021-09-28 17:40 | henry | Note Edited: 0012208 | |
2021-09-28 22:01 | jherb | Note Added: 0012209 | |
2021-09-30 13:29 | henry | Note Added: 0012210 | |
2021-09-30 13:30 | henry | Priority | normal => none |
2021-09-30 13:30 | henry | Severity | major => minor |