View Issue Details
|ID||Project||Category||View Status||Date Submitted||Last Update|
|0001436||OpenFOAM||[All Projects] Bug||public||2014-11-04 12:14||2016-08-16 12:52|
|Fixed in Version|
|Summary||0001436: LTS particle transport does not scale well in parallel|
|Description||The LTS particle transport algorithm does not scale well in parallel for other than very special cases. However this can be tweaked rather easily with a minor change in the transport loop.|
The problem is located at the file src/lagrangian/basic/Cloud/Cloud.C
The forAllIter loop starting at line 251 does the following: All particles on the local processor are transported from their starting position until they reach the processor boundary.
Let's assume the worst case: the particles shall be transported along a pipe, and this pipe is decomposed to the processors in flow direction (see attached test case).
Processor0 transports all particles from the inlet to boundary to Processor1, while all other Processors are sitting idle. After the forAllIter loop is finished, Processor1 can transport all the particles from boundary to Processor0 to boundary of Processor2, while all other Processors are sitting idle etc.
There is no parallel speedup, the performance is limited to a single processors speed.
With a minor modification this limitation can be overcome: the forAllIter loop is interrupted after some particles are transported. Let's say after 5000 particles a "break" is called to leave the forAllIter loop. The 5000 particles can now be transfered to another processor. The while-loop continues until no particles are left for transportation, so in the next passage Processor0 can transport the next 5000 particles, while Processor1 can transport the first 5000 particles etc.
The attached test case shows these timings on my workstation:
serial case: 128s
parallel case, "best" decomposition: 17s
parallel case, "worst" decomposition: 102s
parallel case, "best" decomposition with proposed modifications: 17s
parallel case, "worst" decomposition with proposed modifications: 21s
(By the way: I'm a little bit surprised that ~30% of the particles are getting lost in this simple case... but that's another building site...)
|Steps To Reproduce||Running the test case:|
- modification of system/decomposeParDict to best or worst case simpleCoeffs
- run uncoupledKinematicParcelFoam in parallel
|Additional Information||In the attached modification (Cloud.C.modified) these new variables are introduced:|
label nParticlesWaiting(0); // global sum of the particles waiting for transport
IDLList<ParticleType> alreadyMovedParticles; // list to store the particles, that do not leave the local processor domain. It is necessary to remove them from the "this" list, so that the forAllIter loop skips these particles.
label transportCounter(0); // counter that induces the break call after a certain number of particles have been transported
There are many comments added to the attached Cloud.C.modified file, I hope that they will make things clear in the local context.
Since the modifications only make sense for LTS transport, you might want to embed a proper if-statement for the steadyState transport case.
|Tags||No tags attached.|
0_case.tar.gz (4,278 bytes)
Cloud.C.modified (15,286 bytes)
Sorry for not getting back to you sooner on this.
Thanks for the detailed report, excellent test-case and patch. Your proposal looks fine and I am testing it OpenFOAM-dev on a few cases at the moment.
I am also looking for the best way to calculate or provide the transfer block-size which is currently hard-coded to 5000.
It seems to me that this approach is equally applicable to transient and LTS operation, why do you suggest it should be selected only for LTS?
Thanks for coming back to this issue!
My proposal seems to freeze at least sometimes if less than 5000 particles are injected. However it has been working fine and stable for hundreds of simulations with larger meshes (10e6 cells) and very large number of particles (100 transports of 1e7 particles).
I have a test case stored where it freezes. I'll investigate this case again and let you know...
I have not investigated the performance of transient transport. I think it is less important here, since each particle is transported for a single time step only and the processors along the particles path don't have to wait too much. On the other and it should not do any harm to use my proposal...
||Could you provide the test case which reproduces the freezing problem?|
||There is a lot of complexity and some cost involved in moving the particles between lists; why is this preferable to simply storing the iterator of the current particle and restart from there?|
I have rerun the case that froze in the past. However it runs smoothly with my proposal now...
I don't think it is preferable to use lists. It was just the first solution I tried at that time and after observing the resulting speedup I was satisfied with that performance gain. If you store the iterators instead I'll be happy to test it with my ongoing simulations.
I am updating the rest of my code to 3.0.x and -dev, so that I can test any changes immediately.
||Could you provide an updated version which avoids the need for transfers between lists? Using a stored iterator would have a very small overhead and the same algorithm could be used for all cases whereas the overheads of multi-list approach is such that it would be necessary to provide both it and the previous algorithm which adds to complexity and maintenance effort.|
||I'll do the code refactoring but it might take some time.|
||Any news on the code refactoring or should I close this report?|
||Unfortunately, no news from my side... you can close the report. I hope I can come back to this topic in the future.|
||Waiting for a patch to OpenFOAM-dev which conforms to OpenFOAM coding conventions.|
|2014-11-04 12:14||martinB||New Issue|
|2014-11-04 12:14||martinB||File Added: 0_case.tar.gz|
|2014-11-04 12:14||martinB||File Added: Cloud.C.modified|
|2016-01-17 15:00||henry||Note Added: 0005845|
|2016-01-17 15:27||martinB||Note Added: 0005846|
|2016-01-17 15:35||henry||Note Added: 0005847|
|2016-01-18 08:30||henry||Note Added: 0005849|
|2016-01-18 15:20||martinB||Note Added: 0005850|
|2016-01-18 15:27||henry||Note Added: 0005851|
|2016-01-19 10:01||martinB||Note Added: 0005856|
|2016-08-16 12:03||henry||Note Added: 0006674|
|2016-08-16 12:48||martinB||Note Added: 0006676|
|2016-08-16 12:52||henry||Note Added: 0006677|
|2016-08-16 12:52||henry||Status||new => closed|
|2016-08-16 12:52||henry||Assigned To||=> henry|
|2016-08-16 12:52||henry||Resolution||open => suspended|