View Issue Details

IDProjectCategoryView StatusLast Update
0004161OpenFOAMBugpublic2024-10-07 15:28
Reporterpeth Assigned Tohenry  
PrioritynormalSeveritymajorReproducibilityalways
Status resolvedResolutionfixed 
OSubuntuOS Version22.04 
Product Version12 
Fixed in Versiondev 
Summary0004161: scotch method results in inappropriate decomposition
DescriptionScotch decomposition results in strange and unbalanced decomposition in mesh, which results in slow parallel calculation performance. I tested it on different meshes, and usually all parts of the mesh is decomposed to one processor while the other processors have 0 cells. In other cases, some other processor can have a low number of cells, but the distribution overall still looks highly unbalanced.
The issue in my case happens with the tutorials too, for example with the incompressibleFluid/impeller case, where the scotch method is used by default for the decomposing.
Note:
- the decomposition of the impeller tutorial in of11 on the same system is well-balanced (evenly distributed cell count for all processors)
- in both of11 and of12 cases I used the ThirdParty shipped scotch, which have the same version 6.0.9
- in the of12 case there is no warning, i.e. the decomposition is done, but it looks unbalanced
- I tried to debugging the code running gdb decomposePar with the given tutorial both in of-11 and of-dev. Maybe I am incorrect, but my guess that the problematic part is located in the decomposeOneProc function of the file src/parallel/decompose/scotch.C, more specifically in the calculation of the labelList "velotab", which is a parameter for scotch (https://hal.science/hal-00410327/document).
    In OF11 the velotab components:
            velotab[0]: 1
            velotab[1]: 1
            velotab[2]: 1
            velotab[1000]: 1

    In OF-dev the same velotab components in the same tutorial are a much bigger number: 492123147842001 (?)
    so there is a difference here. I did not debug the scotch.C further. I guess the difference is related to commit https://github.com/OpenFOAM/OpenFOAM-12/commit/78dbbdd7d301d2cedd426e1c54dd1e8ebd22e454
    where the scaleWeights function was modified in src/parallel/decompose/decompositionMethods/decompositionMethod/decompositionMethod.C
Steps To Reproduce1. install of12 from GitHub.
I used:
    WM_LABEL_SIZE=64
    scotch=ThirdParty
    paraview=none
options in etc/bashrc, other settings were left on default
2. copy $FOAM_TUTORIALS/incompressibleFluid/impeller directory to a new directory
3. navigate to the new dir, run blockMesh then decomposePar
TagsNo tags attached.

Activities

peth

2024-10-07 13:44

reporter   ~0013423

edit: velotab calculation part was also modified in this commit besides the previous one: https://github.com/OpenFOAM/OpenFOAM-12/commit/30afcdc331b4d44f011dad8f1122adf6f156cfdf

henry

2024-10-07 14:05

manager   ~0013424

What happens if you don't set

WM_LABEL_SIZE=64

I get a correct decomposition.

peth

2024-10-07 14:55

reporter   ~0013426

I compiled of-dev with WM_LABEL_SIZE=32 in debug mode. Testing it with the same tutorial (incompressibleFluid/impeller) gives a balanced mesh, i.e. almost the same cell number for each processor.
Running decomposePar in debug mode gives for the velotab calculation the following:
    velotab[0]=114582
    velotab[1]=114582
    velotab[2]=114582
    velotab[1000]=114582
i.e. a different and much smaller number than with WM_LABEL_SIZE=64.

henry

2024-10-07 15:04

manager   ~0013427

Right, in the current implementation I have attempted to create a single scaleWeights function which will work with scotch, ptscotch, metis, parmetis and zoltan but it appears that at least scotch does not handle 64bit integers in a completely consistent manner and cannot cope with the weights being scaled to use the 64bit integer range whereas metis and zoltan do. I am now testing a hack where the scaling assumes a 32bit integer range even when the decomposer is compiled with 64 bit integers.

henry

2024-10-07 15:21

manager   ~0013428

Further testing has shown that this appears to be a bug a scotch, even when it is compiled with 64bit labels the valid range of weights correspond to the range 32bit labels, nothing larger is handled correctly. For now I will have to implement a hack work-around until scotch is updated.

henry

2024-10-07 15:28

manager   ~0013429

I have hacked OpenFOAM-dev to work-around the bug/limitation of scotch but it is not clear if this change will adversely affect the operation of the other methods when running on large meshes with 64bit labels so I have not made the same change to OpenFOAM-12 until further testing is complete, ideally funded!

commit 4cb6b7ab673ec7385da4830105fc607684cc1e7b

Issue History

Date Modified Username Field Change
2024-10-07 13:29 peth New Issue
2024-10-07 13:44 peth Note Added: 0013423
2024-10-07 14:05 henry Note Added: 0013424
2024-10-07 14:55 peth Note Added: 0013426
2024-10-07 15:04 henry Note Added: 0013427
2024-10-07 15:21 henry Note Added: 0013428
2024-10-07 15:28 henry Assigned To => henry
2024-10-07 15:28 henry Status new => resolved
2024-10-07 15:28 henry Resolution open => fixed
2024-10-07 15:28 henry Fixed in Version => dev
2024-10-07 15:28 henry Note Added: 0013429