View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0004161 | OpenFOAM | Bug | public | 2024-10-07 13:29 | 2024-10-07 15:28 |
Reporter | peth | Assigned To | henry | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | resolved | Resolution | fixed | ||
OS | ubuntu | OS Version | 22.04 | ||
Product Version | 12 | ||||
Fixed in Version | dev | ||||
Summary | 0004161: scotch method results in inappropriate decomposition | ||||
Description | Scotch decomposition results in strange and unbalanced decomposition in mesh, which results in slow parallel calculation performance. I tested it on different meshes, and usually all parts of the mesh is decomposed to one processor while the other processors have 0 cells. In other cases, some other processor can have a low number of cells, but the distribution overall still looks highly unbalanced. The issue in my case happens with the tutorials too, for example with the incompressibleFluid/impeller case, where the scotch method is used by default for the decomposing. Note: - the decomposition of the impeller tutorial in of11 on the same system is well-balanced (evenly distributed cell count for all processors) - in both of11 and of12 cases I used the ThirdParty shipped scotch, which have the same version 6.0.9 - in the of12 case there is no warning, i.e. the decomposition is done, but it looks unbalanced - I tried to debugging the code running gdb decomposePar with the given tutorial both in of-11 and of-dev. Maybe I am incorrect, but my guess that the problematic part is located in the decomposeOneProc function of the file src/parallel/decompose/scotch.C, more specifically in the calculation of the labelList "velotab", which is a parameter for scotch (https://hal.science/hal-00410327/document). In OF11 the velotab components: velotab[0]: 1 velotab[1]: 1 velotab[2]: 1 velotab[1000]: 1 In OF-dev the same velotab components in the same tutorial are a much bigger number: 492123147842001 (?) so there is a difference here. I did not debug the scotch.C further. I guess the difference is related to commit https://github.com/OpenFOAM/OpenFOAM-12/commit/78dbbdd7d301d2cedd426e1c54dd1e8ebd22e454 where the scaleWeights function was modified in src/parallel/decompose/decompositionMethods/decompositionMethod/decompositionMethod.C | ||||
Steps To Reproduce | 1. install of12 from GitHub. I used: WM_LABEL_SIZE=64 scotch=ThirdParty paraview=none options in etc/bashrc, other settings were left on default 2. copy $FOAM_TUTORIALS/incompressibleFluid/impeller directory to a new directory 3. navigate to the new dir, run blockMesh then decomposePar | ||||
Tags | No tags attached. | ||||
|
edit: velotab calculation part was also modified in this commit besides the previous one: https://github.com/OpenFOAM/OpenFOAM-12/commit/30afcdc331b4d44f011dad8f1122adf6f156cfdf |
|
What happens if you don't set WM_LABEL_SIZE=64 I get a correct decomposition. |
|
I compiled of-dev with WM_LABEL_SIZE=32 in debug mode. Testing it with the same tutorial (incompressibleFluid/impeller) gives a balanced mesh, i.e. almost the same cell number for each processor. Running decomposePar in debug mode gives for the velotab calculation the following: velotab[0]=114582 velotab[1]=114582 velotab[2]=114582 velotab[1000]=114582 i.e. a different and much smaller number than with WM_LABEL_SIZE=64. |
|
Right, in the current implementation I have attempted to create a single scaleWeights function which will work with scotch, ptscotch, metis, parmetis and zoltan but it appears that at least scotch does not handle 64bit integers in a completely consistent manner and cannot cope with the weights being scaled to use the 64bit integer range whereas metis and zoltan do. I am now testing a hack where the scaling assumes a 32bit integer range even when the decomposer is compiled with 64 bit integers. |
|
Further testing has shown that this appears to be a bug a scotch, even when it is compiled with 64bit labels the valid range of weights correspond to the range 32bit labels, nothing larger is handled correctly. For now I will have to implement a hack work-around until scotch is updated. |
|
I have hacked OpenFOAM-dev to work-around the bug/limitation of scotch but it is not clear if this change will adversely affect the operation of the other methods when running on large meshes with 64bit labels so I have not made the same change to OpenFOAM-12 until further testing is complete, ideally funded! commit 4cb6b7ab673ec7385da4830105fc607684cc1e7b |
Date Modified | Username | Field | Change |
---|---|---|---|
2024-10-07 13:29 | peth | New Issue | |
2024-10-07 13:44 | peth | Note Added: 0013423 | |
2024-10-07 14:05 | henry | Note Added: 0013424 | |
2024-10-07 14:55 | peth | Note Added: 0013426 | |
2024-10-07 15:04 | henry | Note Added: 0013427 | |
2024-10-07 15:21 | henry | Note Added: 0013428 | |
2024-10-07 15:28 | henry | Assigned To | => henry |
2024-10-07 15:28 | henry | Status | new => resolved |
2024-10-07 15:28 | henry | Resolution | open => fixed |
2024-10-07 15:28 | henry | Fixed in Version | => dev |
2024-10-07 15:28 | henry | Note Added: 0013429 |