|View Issue Details|
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0002744||OpenFOAM||[All Projects] Bug||public||2017-10-31 21:02||2017-11-17 12:14|
|Target Version||Fixed in Version||dev|
|Summary||0002744: Cannot reconstruct a decomposed collated case|
|Description||reconstructPar fails when run on a decomposed case which uses the collated file format.|
|Steps To Reproduce||Tutorial case coldEngineFoam/freePiston:|
$ mpirun -np 4 coldEngineFoam -parallel
// maxThreadFileBufferSize 0;
The behaviour is sometimes affected by whether or not threading is enabled.
|Additional Information||Reconstructing fields for mesh region0|
Time = 0.001
--> FOAM FATAL IO ERROR:
incorrect first token, expected <int> or '(', found on line 22 an error
file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.001/polyMesh/points at line 22.
From function Foam::Istream &Foam::operator>>(Foam::Istream &, Foam::List<T> &) [with T = char]
in file lnInclude/ListIO.C at line 148.
Sometimes (depending on whether threading is enabled or not) it goes through for the first few time steps and stops later:
Time = 0.005
Reconstructing FV fields
--> FOAM FATAL IO ERROR:
Expected a ')' while reading binaryBlock, found on line 27 an error
file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.005/U at line 27.
From function Foam::Istream &Foam::Istream::readEnd(const char *)
in file db/IOstreams/IOstreams/Istream.C at line 109.
I cannot repeat this. Tried:
- coldEngineFoam tutorial
- decompose into 4 with scotch
- changed (to speed up simulation)
I am running this on a shared memory machine.
- are you running with NFS storage?
- could you compare the conflicting file against runs that did succeed?
- any particular setting that always works (you mentioned threading)
Are you able to reproduce it with:
For me this issue now seems reproducible with the above, using both collated and masterCollated. Decomposition method has no influence nor MPI ranks (as far as I can tell). The issue applies only when threading is turned off.
In decomposedBlockData.C (line numbers may vary):
834 proci < nProcs
835 && (
837 < fileOperations::masterUncollatedFileOperation::
842 totalSize += recvSizes[proci];
When the comparison fails, ranks 1..n do not write out any data to the files (e.g. processors/0.0001/polyMesh/points only has data from rank 0).
If it still doesn't show up in your trials, I can put together and upload a successful and unsuccessful case and logs (with DebugSwitch on). My recent tries were on a local workstation, 4 cores, no NFS.
I can work around it at the moment by using collated file handler with maxMasterFileBufferSize 2e9.
As an aside: is it a misconfiguration if I specify masterCollated with maxMasterFileBufferSize 0? Or what would be the intended behaviour here?
|I have not been able to reproduce it (with your settings) but not really looked at the code yet. I guess if you specify maxMasterFileBufferSize all communication should be blocking ('scheduled' in OpenFOAM speak).|
- I can reproduce your problem now. I was running collated globally (through the etc/controlDict) so it did not re-initialise the collated with the 0 size.
- Could you try attached replacement for db/IOobjects/decomposedBlockData/decomposedBlockData.C. I've tried your case on 2 and 4 processors and think the logic is sound but it'd be nice to have another tester.
- It now always at least receives data for one processor, i.e. proci starts now off from the next processor.
Resolved by commit 03b641d2c7575738d94461527f12ce918dbd1922
|2017-10-31 21:02||sushpa||New Issue|
|2017-10-31 21:02||sushpa||Tag Attached: Collated|
|2017-11-03 09:54||MattijsJ||File Added: controlDict|
|2017-11-03 10:01||MattijsJ||Note Added: 0008986|
|2017-11-04 12:41||sushpa||Note Added: 0008999|
|2017-11-07 16:20||MattijsJ||Note Added: 0009014|
|2017-11-10 12:19||MattijsJ||File Added: decomposedBlockData.C|
|2017-11-10 12:24||MattijsJ||Note Added: 0009019|
|2017-11-17 12:14||henry||Assigned To||=> henry|
|2017-11-17 12:14||henry||Status||new => resolved|
|2017-11-17 12:14||henry||Resolution||open => fixed|
|2017-11-17 12:14||henry||Fixed in Version||=> dev|
|2017-11-17 12:14||henry||Note Added: 0009048|