2017-11-24 09:09 GMT

View Issue Details Jump to Notes ]
IDProjectCategoryView StatusLast Update
0002744OpenFOAM[All Projects] Bugpublic2017-11-17 12:14
Reportersushpa 
Assigned Tohenry 
PrioritynormalSeveritycrashReproducibilitysometimes
StatusresolvedResolutionfixed 
PlatformLinux64OSCentOSOS Version7.4
Product Versiondev 
Target VersionFixed in Versiondev 
Summary0002744: Cannot reconstruct a decomposed collated case
DescriptionreconstructPar fails when run on a decomposed case which uses the collated file format.

OF-dev ea85635b2d451d8ae1c6420db13a76e31d655d5f.
Steps To ReproduceTutorial case coldEngineFoam/freePiston:

$ decomposePar
$ mpirun -np 4 coldEngineFoam -parallel
$ reconstructPar

In system/controlDict:

OptimisationSwitches
{
    fileHandler collated;
// maxThreadFileBufferSize 0;
}

The behaviour is sometimes affected by whether or not threading is enabled.
Additional InformationReconstructing fields for mesh region0

Time = 0.001

--> FOAM FATAL IO ERROR:
incorrect first token, expected <int> or '(', found on line 22 an error

file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.001/polyMesh/points at line 22.

    From function Foam::Istream &Foam::operator>>(Foam::Istream &, Foam::List<T> &) [with T = char]
    in file lnInclude/ListIO.C at line 148.

FOAM exiting

Sometimes (depending on whether threading is enabled or not) it goes through for the first few time steps and stops later:

Time = 0.005

Reconstructing FV fields

    Reconstructing volScalarFields

        air
        nut

    Reconstructing volVectorFields

        U


--> FOAM FATAL IO ERROR:
Expected a ')' while reading binaryBlock, found on line 27 an error

file: /cluster/work/lav/apps/OpenFOAM/OpenFOAM-lav/tutorials/combustion/coldEngineFoam/freePiston/processors/0.005/U at line 27.

    From function Foam::Istream &Foam::Istream::readEnd(const char *)
    in file db/IOstreams/IOstreams/Istream.C at line 109.

FOAM exiting
TagsCollated
Attached Files
  • ? file icon controlDict (1,513 bytes) 2017-11-03 09:54 -
    /*--------------------------------*- C++ -*----------------------------------*\
    | =========                 |                                                 |
    | \\      /  F ield         | OpenFOAM: The Open Source CFD Toolbox           |
    |  \\    /   O peration     | Version:  dev                                   |
    |   \\  /    A nd           | Web:      www.OpenFOAM.org                      |
    |    \\/     M anipulation  |                                                 |
    \*---------------------------------------------------------------------------*/
    FoamFile
    {
        version     2.0;
        format      ascii;
        class       dictionary;
        object      controlDict;
    }
    // * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * //
    
    DebugSwitches
    {
        collated    1;
    }
    
    OptimisationSwitches
    {
        fileHandler collated;
        maxThreadFileBufferSize 100;
    }
    
    application             coldEngineFoam;
    
    startFrom               startTime;
    
    startTime               0;
    
    stopAt                  endTime;
    
    endTime                 0.0003;
    
    deltaT                  5.0e-7;
    
    writeControl            adjustableRunTime;
    
    writeInterval           0.0001;
    
    purgeWrite              0;
    
    writeFormat             binary;
    
    writePrecision          6;
    
    writeCompression        off;
    
    timeFormat              general;
    
    timePrecision           8;
    
    runTimeModifiable       true;
    
    adjustTimeStep          yes;
    
    maxCo                   0.25;
    
    // ************************************************************************* //
    
    ? file icon controlDict (1,513 bytes) 2017-11-03 09:54 +
  • c file icon decomposedBlockData.C (27,445 bytes) 2017-11-10 12:19

-Relationships
+Relationships

-Notes

~0008986

MattijsJ (reporter)

I cannot repeat this. Tried:

- coldEngineFoam tutorial
- decompose into 4 with scotch
- changed (to speed up simulation)
writeInterval 0.0001;
endTime 0.0003;
- reconstructPar

I am running this on a shared memory machine.

- are you running with NFS storage?
- could you compare the conflicting file against runs that did succeed?
- any particular setting that always works (you mentioned threading)

~0008999

sushpa (reporter)

Are you able to reproduce it with:

OptimisationSwitches {
    maxMasterFileBufferSize 0;
    maxThreadFileBufferSize 0;
    fileHandler collated;
}

For me this issue now seems reproducible with the above, using both collated and masterCollated. Decomposition method has no influence nor MPI ranks (as far as I can tell). The issue applies only when threading is turned off.

In decomposedBlockData.C (line numbers may vary):

   832 while
   833 (
   834 proci < nProcs
   835 && (
   836 totalSize+recvSizes[proci]
   837 < fileOperations::masterUncollatedFileOperation::
   838 maxMasterFileBufferSize
   839 )
   840 )
   841 {
   842 totalSize += recvSizes[proci];
   843 proci++;
   844 }

When the comparison fails, ranks 1..n do not write out any data to the files (e.g. processors/0.0001/polyMesh/points only has data from rank 0).

If it still doesn't show up in your trials, I can put together and upload a successful and unsuccessful case and logs (with DebugSwitch on). My recent tries were on a local workstation, 4 cores, no NFS.

I can work around it at the moment by using collated file handler with maxMasterFileBufferSize 2e9.

As an aside: is it a misconfiguration if I specify masterCollated with maxMasterFileBufferSize 0? Or what would be the intended behaviour here?

~0009014

MattijsJ (reporter)

I have not been able to reproduce it (with your settings) but not really looked at the code yet. I guess if you specify maxMasterFileBufferSize all communication should be blocking ('scheduled' in OpenFOAM speak).

~0009019

MattijsJ (reporter)

- I can reproduce your problem now. I was running collated globally (through the etc/controlDict) so it did not re-initialise the collated with the 0 size.
- Could you try attached replacement for db/IOobjects/decomposedBlockData/decomposedBlockData.C. I've tried your case on 2 and 4 processors and think the logic is sound but it'd be nice to have another tester.
- It now always at least receives data for one processor, i.e. proci starts now off from the next processor.

~0009048

henry (manager)

Resolved by commit 03b641d2c7575738d94461527f12ce918dbd1922
+Notes

-Issue History
Date Modified Username Field Change
2017-10-31 21:02 sushpa New Issue
2017-10-31 21:02 sushpa Tag Attached: Collated
2017-11-03 09:54 MattijsJ File Added: controlDict
2017-11-03 10:01 MattijsJ Note Added: 0008986
2017-11-04 12:41 sushpa Note Added: 0008999
2017-11-07 16:20 MattijsJ Note Added: 0009014
2017-11-10 12:19 MattijsJ File Added: decomposedBlockData.C
2017-11-10 12:24 MattijsJ Note Added: 0009019
2017-11-17 12:14 henry Assigned To => henry
2017-11-17 12:14 henry Status new => resolved
2017-11-17 12:14 henry Resolution open => fixed
2017-11-17 12:14 henry Fixed in Version => dev
2017-11-17 12:14 henry Note Added: 0009048
+Issue History