View Issue Details

IDProjectCategoryView StatusLast Update
0002744OpenFOAM[All Projects] Bugpublic2017-11-17 12:14
ReportersushpaAssigned Tohenry 
PrioritynormalSeveritycrashReproducibilitysometimes
Status resolvedResolutionfixed 
PlatformLinux64OSCentOSOS Version7.4
Product Versiondev 
Fixed in Versiondev 
Summary0002744: Cannot reconstruct a decomposed collated case
DescriptionreconstructPar fails when run on a decomposed case which uses the collated file format.

OF-dev ea85635b2d451d8ae1c6420db13a76e31d655d5f.
Steps To ReproduceTutorial case coldEngineFoam/freePiston:

$ decomposePar
$ mpirun -np 4 coldEngineFoam -parallel
$ reconstructPar

In system/controlDict:

OptimisationSwitches
{
    fileHandler collated;
// maxThreadFileBufferSize 0;
}

The behaviour is sometimes affected by whether or not threading is enabled.
Additional InformationReconstructing fields for mesh region0

Time = 0.001

--> FOAM FATAL IO ERROR:
incorrect first token, expected <int> or '(', found on line 22 an error

file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.001/polyMesh/points at line 22.

    From function Foam::Istream &Foam::operator>>(Foam::Istream &, Foam::List<T> &) [with T = char]
    in file lnInclude/ListIO.C at line 148.

FOAM exiting

Sometimes (depending on whether threading is enabled or not) it goes through for the first few time steps and stops later:

Time = 0.005

Reconstructing FV fields

    Reconstructing volScalarFields

        air
        nut

    Reconstructing volVectorFields

        U


--> FOAM FATAL IO ERROR:
Expected a ')' while reading binaryBlock, found on line 27 an error

file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.005/U at line 27.

    From function Foam::Istream &Foam::Istream::readEnd(const char *)
    in file db/IOstreams/IOstreams/Istream.C at line 109.

FOAM exiting
TagsCollated

Activities

MattijsJ

2017-11-03 09:54

reporter  

controlDict (1,513 bytes)
/*--------------------------------*- C++ -*----------------------------------*\
| =========                 |                                                 |
| \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
|  \\    /   O peration     | Version:  dev                                   |
|   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
|    \\/     M anipulation  |                                                 |
\*---------------------------------------------------------------------------*/
FoamFile
{
    version     2.0;
    format      ascii;
    class       dictionary;
    object      controlDict;
}
// * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //

DebugSwitches
{
    collated    1;
}

OptimisationSwitches
{
    fileHandler collated;
    maxThreadFileBufferSize 100;
}

application             coldEngineFoam;

startFrom               startTime;

startTime               0;

stopAt                  endTime;

endTime                 0.0003;

deltaT                  5.0e-7;

writeControl            adjustableRunTime;

writeInterval           0.0001;

purgeWrite              0;

writeFormat             binary;

writePrecision          6;

writeCompression        off;

timeFormat              general;

timePrecision           8;

runTimeModifiable       true;

adjustTimeStep          yes;

maxCo                   0.25;

// ************************************************************************* //
controlDict (1,513 bytes)

MattijsJ

2017-11-03 10:01

reporter   ~0008986

I cannot repeat this. Tried:

- coldEngineFoam tutorial
- decompose into 4 with scotch
- changed (to speed up simulation)
writeInterval 0.0001;
endTime 0.0003;
- reconstructPar

I am running this on a shared memory machine.

- are you running with NFS storage?
- could you compare the conflicting file against runs that did succeed?
- any particular setting that always works (you mentioned threading)

sushpa

2017-11-04 12:41

reporter   ~0008999

Are you able to reproduce it with:

OptimisationSwitches {
    maxMasterFileBufferSize 0;
    maxThreadFileBufferSize 0;
    fileHandler collated;
}

For me this issue now seems reproducible with the above, using both collated and masterCollated. Decomposition method has no influence nor MPI ranks (as far as I can tell). The issue applies only when threading is turned off.

In decomposedBlockData.C (line numbers may vary):

   832 while
   833 (
   834 proci < nProcs
   835 && (
   836 totalSize+recvSizes[proci]
   837 < fileOperations::masterUncollatedFileOperation::
   838 maxMasterFileBufferSize
   839 )
   840 )
   841 {
   842 totalSize += recvSizes[proci];
   843 proci++;
   844 }

When the comparison fails, ranks 1..n do not write out any data to the files (e.g. processors/0.0001/polyMesh/points only has data from rank 0).

If it still doesn't show up in your trials, I can put together and upload a successful and unsuccessful case and logs (with DebugSwitch on). My recent tries were on a local workstation, 4 cores, no NFS.

I can work around it at the moment by using collated file handler with maxMasterFileBufferSize 2e9.

As an aside: is it a misconfiguration if I specify masterCollated with maxMasterFileBufferSize 0? Or what would be the intended behaviour here?

MattijsJ

2017-11-07 16:20

reporter   ~0009014

I have not been able to reproduce it (with your settings) but not really looked at the code yet. I guess if you specify maxMasterFileBufferSize all communication should be blocking ('scheduled' in OpenFOAM speak).

MattijsJ

2017-11-10 12:19

reporter  

decomposedBlockData.C (27,445 bytes)

MattijsJ

2017-11-10 12:24

reporter   ~0009019

- I can reproduce your problem now. I was running collated globally (through the etc/controlDict) so it did not re-initialise the collated with the 0 size.
- Could you try attached replacement for db/IOobjects/decomposedBlockData/decomposedBlockData.C. I've tried your case on 2 and 4 processors and think the logic is sound but it'd be nice to have another tester.
- It now always at least receives data for one processor, i.e. proci starts now off from the next processor.

henry

2017-11-17 12:14

manager   ~0009048

Resolved by commit 03b641d2c7575738d94461527f12ce918dbd1922

Issue History

Date Modified Username Field Change
2017-10-31 21:02 sushpa New Issue
2017-10-31 21:02 sushpa Tag Attached: Collated
2017-11-03 09:54 MattijsJ File Added: controlDict
2017-11-03 10:01 MattijsJ Note Added: 0008986
2017-11-04 12:41 sushpa Note Added: 0008999
2017-11-07 16:20 MattijsJ Note Added: 0009014
2017-11-10 12:19 MattijsJ File Added: decomposedBlockData.C
2017-11-10 12:24 MattijsJ Note Added: 0009019
2017-11-17 12:14 henry Assigned To => henry
2017-11-17 12:14 henry Status new => resolved
2017-11-17 12:14 henry Resolution open => fixed
2017-11-17 12:14 henry Fixed in Version => dev
2017-11-17 12:14 henry Note Added: 0009048