View Issue Details

IDProjectCategoryView StatusLast Update
0003937OpenFOAMBugpublic2022-12-07 15:27
Reporterblttkgl Assigned Tohenry  
PrioritynormalSeveritymajorReproducibilityalways
Status closedResolutionno change required 
PlatformUnixOSOtherOS Version(please specify)
Product Versiondev 
Summary0003937: Zoltan runtime decomposition crashes with lagrangian particles
DescriptionI am running a benchmark case to assess the performance of Zoltan decomposition when utilized with AMR. I have two cases, both using AMR, one without runtime balancing and one with Zoltan runtime balancing.

While the first case is running with no apparent issues, the second one utilizing runtime decomposition crashes early on in the simulation with the following output below. It seems that the lagrangian model tries to find the cell at the parcel location, and unable to do so. Are lagrangians supported with Zoltan decomposition, or is this a bug? I tried reproducing it on the aachenBomb tutorial in parallel, but it was not successful.


[6] #0 Foam::error::printStack(Foam::Ostream&) at ??:?
[6] #1 Foam::sigSegv::sigHandler(int) at ??:?
[6] #2 ? in "/lib64/libc.so.6"
[6] #3 Foam::indexedOctree<Foam::treeDataCell>::findInside(Foam::Vector<double> const&) const at ??:?
[6] #4 Foam::polyMesh::findCellFacePt(Foam::Vector<double> const&, int&, int&, int&) const at ??:?
[6] #5 Foam::InjectionModel<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > > >::findCellAtPosition(int&, i$
[6] #6 Foam::ConeInjection<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > > >::setPositionAndCell(int, int$
[6] #7 void Foam::InjectionModel<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > > >::inject<Foam::SprayClo$
[6] #8 void Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > >::evolveCloud<Foam::SprayCloud<Foam::ReactingC$
[6] #9 void Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > > > >::solve<Foam::SprayCloud<Foam::ReactingCloud<F$
[6] #10 Foam::SprayCloud<Foam::ReactingCloud<Foam::ThermoCloud<Foam::MomentumCloud<Foam::ParcelCloudBase<Foam::SprayParcel<Foam::ReactingParcel<Foam::ThermoParcel<Foam::MomentumParcel<Foam::particle> > > $
[6] #11 Foam::parcelCloudList::evolve() at ??:?
[6] #12 Foam::fv::clouds::correct() at ??:?
[6] #13 Foam::fvModels::correct() at ??:?

Best,

Bulut
TagsNo tags attached.

Activities

henry

2022-11-27 14:24

manager   ~0012883

> Zoltan runtime decomposition crashes with lagrangian particles

This is not supported yet.

blttkgl

2022-11-27 16:25

reporter   ~0012884

Thanks for clarifying!

Bulut

henry

2022-11-30 16:41

manager   ~0012904

Can you provide a simple test-case that reproduces the problem?

blttkgl

2022-12-01 12:02

reporter   ~0012905

I have case that can replicate the issue, but it is rather large. I will try to create a simple case and post it here. Thanks

blttkgl

2022-12-05 08:09

reporter   ~0012906

Attached you can find a test case that reproduces the issue. I added bunch of injectors to the aachenBomb tutorial and decomposed it using hierarchical decomposition, with large redistribution intervals (every 20th timestep) to be able to replicate the issue we have in our larger setup. I can replicate the issue when I decompose this case to 72 processors, and run for about 300-400 iterations (5 minutes wall clock time). I hope this is useful enough. I attach the log file as well.

Bulut
aachenBomb.tar.gz (137,432 bytes)

henry

2022-12-05 20:08

manager   ~0012909

I have reproduced the problem and analysed it. The failure point is in indexedOctree which generates a segmentation fault if there are no cells on a processor, I have resolved this problem with

commit a140f7d65368c38e5997fe1f86756ee20daa1632

however this does not explain why there are no cells on one of the processors. After the above fix (and another temporary hack) the case fails in ZoltanDecomp which apparently cannot handle processors with no cells and hangs; this may only be with the default method, I have not tried others. To avoid this lock-up I have added a check in ZoltanDecomp:

commit 02c7257eb01372d16aae12d9190912220e335c47

However again this does not resolve the fundamental problem that at least the default Zoltan method can generate distributions in which there a no cells on at least one processor which itself cannot then handle when called again to redistribute. I suggest you test other Zoltan methods to see if any of them are more reliable.

henry

2022-12-07 15:27

manager   ~0012916

The fundamental problem/bug is in Zoltan which can generate a redistribution graph empty processors which it then cannot handle an the next call.

Issue History

Date Modified Username Field Change
2022-11-27 14:18 blttkgl New Issue
2022-11-27 14:24 henry Note Added: 0012883
2022-11-27 16:25 blttkgl Note Added: 0012884
2022-11-30 16:41 henry Note Added: 0012904
2022-12-01 12:02 blttkgl Note Added: 0012905
2022-12-05 08:09 blttkgl Note Added: 0012906
2022-12-05 08:09 blttkgl File Added: aachenBomb.tar.gz
2022-12-05 20:08 henry Note Added: 0012909
2022-12-07 15:27 henry Assigned To => henry
2022-12-07 15:27 henry Status new => closed
2022-12-07 15:27 henry Resolution open => no change required
2022-12-07 15:27 henry Note Added: 0012916