View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0002744 | OpenFOAM | [All Projects] Bug | public | 2017-10-31 21:02 | 2017-11-17 12:14 |
Reporter | sushpa | Assigned To | henry | ||
Priority | normal | Severity | crash | Reproducibility | sometimes |
Status | resolved | Resolution | fixed | ||
Platform | Linux64 | OS | CentOS | OS Version | 7.4 |
Product Version | dev | ||||
Fixed in Version | dev | ||||
Summary | 0002744: Cannot reconstruct a decomposed collated case | ||||
Description | reconstructPar fails when run on a decomposed case which uses the collated file format. OF-dev ea85635b2d451d8ae1c6420db13a76e31d655d5f. | ||||
Steps To Reproduce | Tutorial case coldEngineFoam/freePiston: $ decomposePar $ mpirun -np 4 coldEngineFoam -parallel $ reconstructPar In system/controlDict: OptimisationSwitches { fileHandler collated; // maxThreadFileBufferSize 0; } The behaviour is sometimes affected by whether or not threading is enabled. | ||||
Additional Information | Reconstructing fields for mesh region0 Time = 0.001 --> FOAM FATAL IO ERROR: incorrect first token, expected <int> or '(', found on line 22 an error file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.001/polyMesh/points at line 22. From function Foam::Istream &Foam::operator>>(Foam::Istream &, Foam::List<T> &) [with T = char] in file lnInclude/ListIO.C at line 148. FOAM exiting Sometimes (depending on whether threading is enabled or not) it goes through for the first few time steps and stops later: Time = 0.005 Reconstructing FV fields Reconstructing volScalarFields air nut Reconstructing volVectorFields U --> FOAM FATAL IO ERROR: Expected a ')' while reading binaryBlock, found on line 27 an error file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.005/U at line 27. From function Foam::Istream &Foam::Istream::readEnd(const char *) in file db/IOstreams/IOstreams/Istream.C at line 109. FOAM exiting | ||||
Tags | Collated | ||||
|
controlDict (1,513 bytes)
/*--------------------------------*- C++ -*----------------------------------*\ | ========= | | | \\ / F ield | OpenFOAM: The Open Source CFD Toolbox | | \\ / O peration | Version: dev | | \\ / A nd | Web: www.OpenFOAM.org | | \\/ M anipulation | | \*---------------------------------------------------------------------------*/ FoamFile { version 2.0; format ascii; class dictionary; object controlDict; } // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * // DebugSwitches { collated 1; } OptimisationSwitches { fileHandler collated; maxThreadFileBufferSize 100; } application coldEngineFoam; startFrom startTime; startTime 0; stopAt endTime; endTime 0.0003; deltaT 5.0e-7; writeControl adjustableRunTime; writeInterval 0.0001; purgeWrite 0; writeFormat binary; writePrecision 6; writeCompression off; timeFormat general; timePrecision 8; runTimeModifiable true; adjustTimeStep yes; maxCo 0.25; // ************************************************************************* // |
|
I cannot repeat this. Tried: - coldEngineFoam tutorial - decompose into 4 with scotch - changed (to speed up simulation) writeInterval 0.0001; endTime 0.0003; - reconstructPar I am running this on a shared memory machine. - are you running with NFS storage? - could you compare the conflicting file against runs that did succeed? - any particular setting that always works (you mentioned threading) |
|
Are you able to reproduce it with: OptimisationSwitches { maxMasterFileBufferSize 0; maxThreadFileBufferSize 0; fileHandler collated; } For me this issue now seems reproducible with the above, using both collated and masterCollated. Decomposition method has no influence nor MPI ranks (as far as I can tell). The issue applies only when threading is turned off. In decomposedBlockData.C (line numbers may vary): 832 while 833 ( 834 proci < nProcs 835 && ( 836 totalSize+recvSizes[proci] 837 < fileOperations::masterUncollatedFileOperation:: 838 maxMasterFileBufferSize 839 ) 840 ) 841 { 842 totalSize += recvSizes[proci]; 843 proci++; 844 } When the comparison fails, ranks 1..n do not write out any data to the files (e.g. processors/0.0001/polyMesh/points only has data from rank 0). If it still doesn't show up in your trials, I can put together and upload a successful and unsuccessful case and logs (with DebugSwitch on). My recent tries were on a local workstation, 4 cores, no NFS. I can work around it at the moment by using collated file handler with maxMasterFileBufferSize 2e9. As an aside: is it a misconfiguration if I specify masterCollated with maxMasterFileBufferSize 0? Or what would be the intended behaviour here? |
|
I have not been able to reproduce it (with your settings) but not really looked at the code yet. I guess if you specify maxMasterFileBufferSize all communication should be blocking ('scheduled' in OpenFOAM speak). |
|
decomposedBlockData.C (27,445 bytes) |
|
- I can reproduce your problem now. I was running collated globally (through the etc/controlDict) so it did not re-initialise the collated with the 0 size. - Could you try attached replacement for db/IOobjects/decomposedBlockData/decomposedBlockData.C. I've tried your case on 2 and 4 processors and think the logic is sound but it'd be nice to have another tester. - It now always at least receives data for one processor, i.e. proci starts now off from the next processor. |
|
Resolved by commit 03b641d2c7575738d94461527f12ce918dbd1922 |
Date Modified | Username | Field | Change |
---|---|---|---|
2017-10-31 21:02 | sushpa | New Issue | |
2017-10-31 21:02 | sushpa | Tag Attached: Collated | |
2017-11-03 09:54 | MattijsJ | File Added: controlDict | |
2017-11-03 10:01 | MattijsJ | Note Added: 0008986 | |
2017-11-04 12:41 | sushpa | Note Added: 0008999 | |
2017-11-07 16:20 | MattijsJ | Note Added: 0009014 | |
2017-11-10 12:19 | MattijsJ | File Added: decomposedBlockData.C | |
2017-11-10 12:24 | MattijsJ | Note Added: 0009019 | |
2017-11-17 12:14 | henry | Assigned To | => henry |
2017-11-17 12:14 | henry | Status | new => resolved |
2017-11-17 12:14 | henry | Resolution | open => fixed |
2017-11-17 12:14 | henry | Fixed in Version | => dev |
2017-11-17 12:14 | henry | Note Added: 0009048 |