View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0003937 | OpenFOAM | Bug | public | 2022-11-27 14:18 | 2022-12-07 15:27 |
Reporter | blttkgl | Assigned To | henry | ||
Priority | normal | Severity | major | Reproducibility | always |
Status | closed | Resolution | no change required | ||
Platform | Unix | OS | Other | OS Version | (please specify) |
Product Version | dev | ||||
Summary | 0003937: Zoltan runtime decomposition crashes with lagrangian particles | ||||
Description | I am running a benchmark case to assess the performance of Zoltan decomposition when utilized with AMR. I have two cases, both using AMR, one without runtime balancing and one with Zoltan runtime balancing. While the first case is running with no apparent issues, the second one utilizing runtime decomposition crashes early on in the simulation with the following output below. It seems that the lagrangian model tries to find the cell at the parcel location, and unable to do so. Are lagrangians supported with Zoltan decomposition, or is this a bug? I tried reproducing it on the aachenBomb tutorial in parallel, but it was not successful. [6] #0 Foam::error::printStack(Foam::Ostream&) at ??:? [6] #1 Foam::sigSegv::sigHandler(int) at ??:? [6] #2 ? in "/lib64/libc.so.6" [6] #3 Foam::indexedOctree<Foam::treeDataCell>::findInside(Foam::Vector<double> const&) const at ??:? [6] #4 Foam::polyMesh::findCellFacePt(Foam::Vector<double> const&, int&, int&, int&) const at ??:? [6] #5 Foam::InjectionModel<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > > >::findCellAtPosition(int&, i$ [6] #6 Foam::ConeInjection<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > > >::setPositionAndCell(int, int$ [6] #7 void Foam::InjectionModel<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > > >::inject<Foam::SprayClo$ [6] #8 void Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > >::evolveCloud<Foam::SprayCloud<Foam::ReactingC$ [6] #9 void Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > >::solve<Foam::SprayCloud<Foam::ReactingCloud<F$ [6] #10 Foam::SprayCloud<Foam::ReactingCloud<Foam::ThermoCloud<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > $ [6] #11 Foam::parcelCloudList::evolve() at ??:? [6] #12 Foam::fv::clouds::correct() at ??:? [6] #13 Foam::fvModels::correct() at ??:? Best, Bulut | ||||
Tags | No tags attached. | ||||
|
> Zoltan runtime decomposition crashes with lagrangian particles This is not supported yet. |
|
Thanks for clarifying! Bulut |
|
Can you provide a simple test-case that reproduces the problem? |
|
I have case that can replicate the issue, but it is rather large. I will try to create a simple case and post it here. Thanks |
|
Attached you can find a test case that reproduces the issue. I added bunch of injectors to the aachenBomb tutorial and decomposed it using hierarchical decomposition, with large redistribution intervals (every 20th timestep) to be able to replicate the issue we have in our larger setup. I can replicate the issue when I decompose this case to 72 processors, and run for about 300-400 iterations (5 minutes wall clock time). I hope this is useful enough. I attach the log file as well. Bulut |
|
I have reproduced the problem and analysed it. The failure point is in indexedOctree which generates a segmentation fault if there are no cells on a processor, I have resolved this problem with commit a140f7d65368c38e5997fe1f86756ee20daa1632 however this does not explain why there are no cells on one of the processors. After the above fix (and another temporary hack) the case fails in ZoltanDecomp which apparently cannot handle processors with no cells and hangs; this may only be with the default method, I have not tried others. To avoid this lock-up I have added a check in ZoltanDecomp: commit 02c7257eb01372d16aae12d9190912220e335c47 However again this does not resolve the fundamental problem that at least the default Zoltan method can generate distributions in which there a no cells on at least one processor which itself cannot then handle when called again to redistribute. I suggest you test other Zoltan methods to see if any of them are more reliable. |
|
The fundamental problem/bug is in Zoltan which can generate a redistribution graph empty processors which it then cannot handle an the next call. |
Date Modified | Username | Field | Change |
---|---|---|---|
2022-11-27 14:18 | blttkgl | New Issue | |
2022-11-27 14:24 | henry | Note Added: 0012883 | |
2022-11-27 16:25 | blttkgl | Note Added: 0012884 | |
2022-11-30 16:41 | henry | Note Added: 0012904 | |
2022-12-01 12:02 | blttkgl | Note Added: 0012905 | |
2022-12-05 08:09 | blttkgl | Note Added: 0012906 | |
2022-12-05 08:09 | blttkgl | File Added: aachenBomb.tar.gz | |
2022-12-05 20:08 | henry | Note Added: 0012909 | |
2022-12-07 15:27 | henry | Assigned To | => henry |
2022-12-07 15:27 | henry | Status | new => closed |
2022-12-07 15:27 | henry | Resolution | open => no change required |
2022-12-07 15:27 | henry | Note Added: 0012916 |