View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0003189||OpenFOAM||Bug||public||2019-03-07 13:51||2019-03-08 09:19|
|Platform||Linux||OS||Ubuntu||OS Version||18.04 and 16.04|
|Fixed in Version|
|Summary||0003189: MPIRun with interFoam crashes|
|Description||The case is a rotating cylinder with a liquid/air-mixture, solved with interFoam|
It runs smoothly on single core and parallel on OpenFOAM-5.x, however crashes in parallel with OpenFOAM-Dev
The crash occurs usually after 1-3 timesteps, when it will stop. The last output is usually the "PIMPLE converged after..." or the "Execution time..." line, probably depending on which processor crashes.
I have reproduced this crash on computers with Ubuntu 16.04 and 18.04
Sometimes it gives out an error message like:
PIMPLE: Converged in 3 iterations
ExecutionTime = 0.24 s ClockTime = 0 s
[Kiruna:23922] *** An error occurred in MPI_Recv
[Kiruna:23922] *** reported by process [3668180993,7]
[Kiruna:23922] *** on communicator MPI COMMUNICATOR 3 SPLIT FROM 0
[Kiruna:23922] *** MPI_ERR_TRUNCATE: message truncated
[Kiruna:23922] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[Kiruna:23922] *** and potentially your MPI job)
|Steps To Reproduce||copy the tar|
mpirun -np 8 interFoam -parallel
|Tags||No tags attached.|
cylinder.tar.gz (5,644 bytes)
If you do a debug build of libfiniteVolume.so then it fails in serial, too.
The problem is the alpha sub cycling. It changes the time index which causes the solver performace data to be cleared. This happens on every pimple iteration. When the second iteration comes along all the data from the first is missing, and so the attempt to index into it fails.
I'm not sure how to deal with this yet.
I've made the solver data reset check for sub-cycling, and use the index of the outer time state if necessary. This resolves the issue. See:
Incidentally, the behaviour in 5.x probably isn't correct either. It might not crash, but the reset issue means the residuals used by the pimple corrector convergence control aren't likely to be the right ones.
|2019-03-07 13:51||Blumenkind||New Issue|
|2019-03-07 13:51||Blumenkind||File Added: cylinder.tar.gz|
|2019-03-07 15:31||will||Note Added: 0010352|
|2019-03-08 09:19||will||Assigned To||=> will|
|2019-03-08 09:19||will||Status||new => resolved|
|2019-03-08 09:19||will||Resolution||open => fixed|
|2019-03-08 09:19||will||Fixed in Version||=> 6|
|2019-03-08 09:19||will||Note Added: 0010354|